Why Should We Approximate the Future?

Imagine that you are a foreseer, and you predict that a calamity is drawing near, and you know how to prevent it. Would you rather stand up and prevent it, or would you only watch how your…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




3 Tricks to Make Your Python Projects More Sophisticated

Sharing tricks that helped me in data engineering with Python

Data engineering is a fascinating field. We are dealing with a variety of tools, databases, data sources in different forms and shapes, and ETL jobs processing vast amounts of data every day. Due to the diversity of tasks and technologies, it pays off to know some useful tricks to make you more productive with respect to data processing and code deployments. In this article, we’ll look at three tricks that will make your Python projects more efficient.

When reading data from flat files, many data engineers use libraries such as pathlib and shutil to create directories and remove them at the end of the script to ensure that the data pipeline remains idempotent, i.e., that a subsequent run can be executed without any undesirable side effects.

What are the typical use cases when you may need a temporary directory? If your data resides in S3 or on some remote servers, and you have to first download it for further processing and perhaps eventually load the transformed data to a data warehouse or some other database. A temporary directory is also useful if you need to store data from an external API, or if you want to cache data to some temporary location. All those use cases can benefit from a temporary directory.

Here is an example of how it could look like by using pathlib and shutil:

On line 23, we create a directory data under our root project directory. On line 27, we remove this directory just before our script ends to ensure that we don't accidentally ingest the same data more than once (idempotency issue).

The above pattern works fine. However, a more sophisticated and robust approach would be to leverage a temporary directory. Python makes it easy due to a built-in package tempfile. Here…

Add a comment

Related posts:

Orang Tua Kemarin dan Besok

Hubungan orang tua dan anaknya senantiasa memberikan pengalaman yang bervariasi. Cara orang tua mendidik anak, dipengaruhi juga oleh cara orang tua dididik oleh kakek nenek kita, begitu seterusnya…

Top 5 Cards you Want When Investing in Basketball

When it comes to investing in a player, what cards to buy can often leave buyers with a difficult decision. Over the last year, the rise in card prices have risen dramatically tightening many…

Tutorial. How to vote for the governance proposal.

The article aims to explain how to vote with GTON, the governance token, for approving or rejecting a certain governance proposal for the Graviton system. Please note that before attempting to vote…