Teddy Petrou is founder of Dunder Data, a company that specializes in helping students become experts at data science using Python. Teddy is the author of multiple highly rated texts such as Pandas Cookbook, Master the Fundamentals of Python, and Master Data Analysis with Python. Teddy has taught hundreds of students Python and data science during in-person classroom settings. He sees first hand exactly where students struggle and continually upgrades his material to minimize these struggles by providing simple and direct paths forward. Teddy holds a master's degree in statistics from Rice University.
Python software engineers, data scientists, those that are interested in building a complete Python library.
We will download and install VS Code, an excellent free interactive development environment, then set up the environment and learn about test-driven development. We begin by inspecting the init file which will begin as the sole location for our library code. We’ll then learn how to import our library into a Jupyter Notebook. DataFrame construction is begun by checking input types.
We will implement several basic DataFrame properties such as access to the columns, values, and data types. A nice visual representation of the DataFrame will be displayed. We’ll then learn how to select subsets of data with the square brackets.
We’ll create several methods, adding powerful features to our DataFrame. It will have the ability to aggregate, determine if a value is missing or not, and find unique values.
Several more powerful methods will be added to our DataFrame. It will have the ability to rename and drop columns, sort column values, take random samples, create pivot tables and more.
We’ll add the ability to read text data from files. This can be quite a challenge as data can be formatted in a wide variety of ways.
You’ll now have a complete data analysis library with many of the same capabilities as the pandas library. Your code will have passed at least 100 tests through this process. We’ll discuss possible additional features and next steps you can take to improve the library.