Writing Parquet Files in Python with Pandas, PySpark, and Koalas
This blog post shows how to convert a CSV file to Parquet with Pandas, Spark, PyArrow and Dask. It discusses the pros and cons of each approach and explains how both approaches can happily coexist in...
View ArticleDeep dive into how pyenv actually works by leveraging the shim design pattern
pyenv lets you manage multiple versions of Python on your computer. This blog post focuses on how pyenv uses the shim design pattern to provide a wonderful user experience (it doesn’t focus on...
View ArticleBuilding DAGs / Directed Acyclic Graphs with Python
Directed Acyclic Graphs (DAGs) are a critical data structure for data science / data engineering workflows. DAGs are used extensively by popular projects like Apache Airflow and Apache Spark. This...
View ArticleAmazing Python Data Workflow with Poetry, Pandas, and Jupyter
Poetry makes it easy to install Pandas and Jupyter to perform data analyses. Poetry is a robust dependency management system and makes it easy to make Python libraries accessible in Jupyter notebooks....
View ArticleSplitting Large CSV files with Python
This blog post demonstrates different approaches for splitting a large CSV file into smaller CSV files and outlines the costs / benefits of the different approaches. TL;DR It’s faster to split a CSV...
View Article