Left To Our Own Devices

When does social media become anti-social? Is Tiktok the most addictive social media platform in internet history? An exploration of Tiktok, and it’s unyielding power over societal youth. College…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Data Engineering Good Practices to Database Queries

Optimization is essential for reducing the usage of system resources that are
required in order to fulfill a task while also allowing your application to handle more queries simultaneously. Thus, you can improve the overall system performance and help users get results faster. Poorly written queries can cause loss of service for the users, consume too many system resources or take too long to execute.

This article shows a collection of good practices to help you make better project decisions, some of their drawbacks and real use cases.

This technique is extremely useful when the table contains a specific column that is used as a reference (source of truth) for further queries. In that scenario, the column is a good candidate to have an index applied.

With this simple example is already possible to obtain significant gains in speed, but keep in mind that for this and other examples showed in this article there is a multitude of variations and fine-grained parameters that can be applied.

It is important to mention that there is no silver bullet solution for optimization problems.

Creating more indexes in a table can speed up when reading data, but will slow down the writing of new rows to the table. Hence, it is important to understand what kind of operations will be performed.

As many other topics in this article, deciding the order of join clauses is hard.

When developing long and complex queries it is common to use ORDER BY statements to visualize the return of intermediary query results. If that is the case, after the development phase is finished remember to remove all the ORDER BY statements from sub-queries since it will only slow down your query execution time. In later sections we will discuss exceptional scenarios where this clause is useful.

String transformations are a common operation in databases specially when trying to compare/filter out some values in a query. Generally, this kind of operation is not costly, but it is not advisable to use them in large amount of values or in sub-queries that are called frequently.

This article won’t go deep into the importance of standards in software engineering, but a real example might be enough to convince you to adopt a standard when storing strings in your database: It was observed a speed up of around 6 times after removing the transformations regarding string conversion to lowercase.

Always adding UPPER() or LOWER() to convert text during a query will heavily impact performance. But how to apply this transformation after the project is deployed? How to change large amount of values that are already inserted in the database with minimal changes? The next topic shows an approach to handle this situation.

DEF: Anonymous blocks are PL/SQL blocks which do not have any names assigned to them. They need to be created and used in the same session therefore they will not be stored in the server as database objects. Since they do not need store in the database, they need no compilation steps. They are written and executed directly, and compilation and execution happen in the same process.

Sometimes there are unexpected outcomes that are not mapped during the planning phase of a project. In those situations anonymous code blocks can be helpful to apply data transformations without making big changes to the database. The following query is used to exemplify a real case:

The above query is used to set columns of text type from table0, table1, and table2 that are not sku to lowercase.

Part of the refactor on string standardization mentioned in the previous topic came from applying this anonymous code block given that the requirement of storing lowercase strings in the database was not described during the planning phase of the project.

When does one actually need to use a `WITH` clause? Well, there are a few unique use cases. Most of them are geared towards ease of query development and maintenance. The main use cases for the `WITH` clause are twofold:

The second approach is tempting and might be useful for implementing complex business logic in pure SQL. But it can lead to a procedural mindset and lose the power of sets and joins.

It is important to keep in mind that the result of a WITH clause is not stored anywhere in the database schema. It only exists while the query is being executed.

When you need an intermediate computation to be persistent, i.e. stored in the database, it might be interesting to use a materialized view. The main thing that sets a materialized view apart is that it is a copy of a query result that does not run in real-time.

Example of a materialized view

The drawbacks are that it will use a little more storage space, since it creates a new table, and you have to make sure that the view has the most recent data. You can refresh it manually, or set it to refresh on schedule or by triggers in order to reduce the risk of viewing obsolete data.

Another real example is that using a materialized view combined with good indexes led to a speed up of around 10 times.

Similar to an aggregate function — a function that transforms a set of rows into a single row — , a window function operates on a set of rows. However, it does not reduce the amount of rows returned by the query.

The term window define the set of rows on which the window function is executed. This is useful especially when you have to apply calculations over data with intrinsic ordering like a time-series.

It also allows you to control how the window will be applied over the partition of rows selected. The figure bellow shows the possible values for delimiting a window frame:

Boundaries for delimiting a window frame

Query Result

With this we are able to solve complex business logic with SQL since you can apply many aggregate functions to a window and handle the ordering of the rows as you wish.

In this article, we’ve discussed some basic query optimization techniques in PostgreSQL. The most important thing is to learn the principles on how to use them and to grasp the nuances of working with main database objects.

The examples in this article were written in PostgreSQL.

Add a comment

Related posts:

Rising Star Hannyta On The Five Things You Need To Shine In The Music Industry

As a part of our series about rising music stars, I had the distinct pleasure of interviewing Hannyta. Hannyta is a young musician, singer, songwriter still in High School and she is currently based…

The Art of Stock Market Trend Estimation

There are 5 basic and most important aspects of the economy investors need to consider before choosing any investment and they are market volatility, interest rates, inflation, bond yield and commodity prices.