PostgreSQL Indexes Can Hurt Performance: Exploring the Negative Effects and the Costs Involved (2024)

This blog was originally published in April 2023 and was updated in April 2024.

Indexes are generally considered to be the panacea when it comes to SQL performance tuning, and PostgreSQL supports different types of indexes catering to different use cases. I keep seeing many articles and talks on “tuning” discussing how creating new indexes speeds up SQL but rarely ones discussing removing them. The urge to create more and more indexes is found to be causing severe damage in many systems. Many times, removing indexes is what we should be doing first before considering any new indexes for the benefit of the entire system. Surprised? Understanding the consequences and overhead of indexes can help to make an informed decision and potentially save the system from many potential problems.

At a very basic level, we should remember that indexes are not free of cost. The benefits come with a cost in terms of performance and resource consumption. The following is the list of ten problems/overheads that the excessive use of indexes can cause. This post is about PostgreSQL, but most of the problems also apply to other database systems.

10 Ways Excessive Indexes Can Hurt PostgreSQL Performance

1. Indexes penalize the transactions

We might see an improvement in the performance of a SELECT statement after adding an index. But we should not forget that the performance gains come with a cost to the transactions on the same table. Conceptually, every DML on a table needs to update all the indexes of the table. Even though there are a lot of optimizations for reducing the write amplification, it is a considerable overhead.

For example, let’s assume that there are five indexes on a table; every INSERT into the table will result in an INSERT of the index record on those five indexes. Logically, five index pages also will be updated. So effectively, the overhead is 5x.

2. Memory usage

Index pages must be in memory, regardless of whether any query uses them because they need to get updated by transactions. Effectively, the memory available for pages of the table gets less. The more indexes, the more the requirement of memory for effective caching. If we don’t increase the available memory, this starts hurting the entire performance of the system.

3. Random writes: Updating indexes is more costly

Unlike INSERTS new records into tables, rows are less likely to be inserted into the same page. Indexes like B-Tree indexes are known to cause more random writes.

4. Indexes need more cache than tables

Due to random writes and reads, indexes need more pages to be in the cache. Cache requirements for indexes are generally much higher than associated tables.

5. WAL generation

In addition to WAL records of the table updates, there will also be WAL records for indexes. This helps in crash recovery and replication. If you are using any wait event analysis tools/scripts like pg_gather, the overhead of the WAL generation will be clearly visible. The actual impact depends on the index type.

PostgreSQL Indexes Can Hurt Performance: Exploring the Negative Effects and the Costs Involved (1)

This is a synthetic test case, but if WAL-related wait events appear as any of the top wait events, it is a matter of concern for a transaction system, and we should take every step to address it.

Watch our Free Webinar: A Deep Dive into PostgreSQL Indexing

6. More and more I/O

Not just WAL records are generated; we will have more pages dirtied, as well. As the index pages get dirtied, they must be written back to files, leading to more I/O again—the “DataFileWrite” wait event, as seen in the previous screenshot.

Another side effect is indexes increase the total Active-Dataset size. By “Active dataset,” I mean the tables and indexes which are frequently queried and used. As the size of the active dataset increases, the cache becomes less and less efficient. Less-effective cache results in more datafile read, so read I/O is increased. This is in addition to the read I/O required to bring the additional index pages from storage for specific queries.

Again the pg_gather report of another system with mainly select queries shows this problem. As the Active-Dataset increases, PostgreSQL has no choice but to bring the pages from storage.

PostgreSQL Indexes Can Hurt Performance: Exploring the Negative Effects and the Costs Involved (2)

A more significant percentage of “DataFileRead” sustaining for a longer duration indicates that the Active-Dataset is much bigger, which is not cachable.

7. Impact on VACUUM/AUTOVACUUM

The overhead is not only for inserting or updating index pages, as discussed in the previous points. There is overhead in maintaining it since the indexes also need cleanups of old tuple references.

I have seen cases where autovacuum workers on a single table run for a very long duration because of the size of the table and, most importantly, the excessive number of indexes on the table. In fact, it is widespread that users see their autovacuum worker is “stuck” for hours without showing any progress for a longer duration. This happens because the index cleanup by the autovacuum is the opaque stage of autovacuum and is not visible through views like pg_stat_progress_vacuum other than the vacuum phase is indicated as vacuuming indexes.

Indexes can get bloated and become less efficient over time. Periodic index maintenance (REINDEX) might be needed in many systems.

8. Tunnel vision while tuning

Tunnel vision is the loss of the field view. The user may be concentrating on a particular SQL statement in an attempt to “tune” and decide on creating indexes. By creating an index for tuning a query, we are shifting more system resources to that query. Then it may give more performance to that particular statement by penalizing others.

But as we keep creating more and more indexes for tuning other queries, the resources will shift again towards other queries. This leads to a situation where the effort to tune every query penalizes every other query. Ultimately, everyone will be hurt, and only losers will be in this war. Someone trying to tune should consider how every part of the system can co-exist (maximizing business value) rather than absolute maximum performance for a particular query.

9. Greater storage requirement

Almost every day, I see cases where indexes take more storage than tablets.

PostgreSQL Indexes Can Hurt Performance: Exploring the Negative Effects and the Costs Involved (3)

This may sound too silly for those with more money to spend on storage, but we should remember that this has a cascading effect. The total database size grows to a multiple of the actual data. So obviously, backups take more time, storage, and network resources, and then the same backup can put more load on the host machine. This would also increase the time to restore a backup and recover it. Bigger databases affect many things, including more time to build standby instances.

10. Indexes are more prone to corruption

I am not just talking about rarely occurring index-related bugs like silent index corruption of PostgreSQL 14 or index corruption due to glibc collation change, which keeps popping up now and then and affects many environments even today. Over decades of working with databases, I have observed that index corruptions are reported more frequently. (I hope anyone involved in PostgreSQL for years and who has seen hundreds of cases will agree with me). As we increase the number of indexes, we increase the probability.

What should we do?

A set of critical questions should accompany new index considerations: Is it essential to have this index, or is it necessary to speed up the query at the cost of more index? Is there a way to rewrite the query to get a better performance? Is it ok to discard the small gains and live without an index?

Existing indexes also require a critical review over a period of time. All unused indexes (those indexes with idx_scan as zero in pg_stat_user_indexes) should be considered for dropping. Scripts like the one from pgexperts can help to do more analysis.

The upcoming PostgreSQL 16 has one more column in pg_stat_user_indexes / pg_stat_all_indexes with the name last_idx_scan, which can tell us when was the last time the index was used (timestamp). This will help us to take a well-informed look at all the indexes in the system.

Summary

The summary in simple words: Indexes are not cheap. There is a cost, and the cost can be manifold. Indexes are not always good, and sequential scans are not always bad, either. My humble advice is to avoid looking for improving individual queries as the first step because it is a slippery slope. A top-down approach to tuning the system yields better results starting from tuning the Host machine, Operating System, PostgreSQL parameter, Schema, etc. An objective “cost-benefit analysis” is important before creating an index.

Our PostgreSQL Performance Tuning guide condenses years of database expertise into a practical guide for optimizing your PostgreSQL databases. Get it today:

Elevate your PostgreSQL Performance

FAQs

Why use indexes in PostgreSQL?

  • Improve Query Performance: Indexes significantly reduce the data search time, making queries faster by allowing the database to locate and retrieve data without scanning every row.
  • Efficiency in Sorting and Filtering: They speed up data retrieval, sorting, and filtering operations, especially in tables with large volumes of data.
  • Optimization of Join Operations: Indexes can improve the efficiency of join operations by quickly matching rows between tables.
  • Unique Data Enforcement: Unique indexes prevent duplicate values in a column, ensuring data integrity.

When should I CREATE INDEX in PostgreSQL?

Creating an index in PostgreSQL is most beneficial for columns frequently involved in search conditions (WHERE clauses), join conditions, or as part of an ORDER BY clause in queries. However, it’s important to balance the need for speed in query processing against the write and storage overhead that indexes introduce.

What are the downsides of creating an index in PostgreSQL?

While indexes are invaluable for improving query performance, there are downsides. Every index added to a table increases the storage space required by the database. Additionally, indexes can slow down write operations such as INSERT, UPDATE, and DELETE, as each operation requires updating the indexes. Excessive indexing can lead to increased complexity in query planning and execution, potentially negatively affecting overall performance if not carefully managed.

Related

Subscribe

Connect with

Login

0 Comments

Inline Feedbacks

View all comments

PostgreSQL Indexes Can Hurt Performance: Exploring the Negative Effects and the Costs Involved (2024)

FAQs

PostgreSQL Indexes Can Hurt Performance: Exploring the Negative Effects and the Costs Involved? ›

While indexes are invaluable for improving query performance, there are downsides. Every index added to a table increases the storage space required by the database. Additionally, indexes can slow down write operations such as INSERT, UPDATE, and DELETE, as each operation requires updating the indexes.

When should indexes be avoided in PostgreSQL? ›

Indexes should not be used on small tables. Tables that have frequent, large batch update or insert operations. Indexes should not be used on columns that contain a high number of NULL values.

Does indexing affect performance? ›

In summary, indexes are a powerful tool for enhancing database performance, especially in read-intensive environments. However, indiscriminate use can lead to decreased performance and increased storage usage. Hence, the right balance and ongoing management is key to optimal performance.

What are the disadvantages of PostgreSQL? ›

One of the main drawbacks is the interpretation overhead inherent to traditional interpretive SQL engines, which hinders optimal CPU utilization. Additionally, PostgreSQL uses an interpreter to execute SQL queries, resulting in overhead caused by indirect calls to handler functions and runtime checks.

What are the disadvantages of adding index? ›

The first and perhaps most obvious drawback of adding indexes is that they take up additional storage space. The exact amount of space depends on the size of the table and the number of columns in the index, but it's usually a small percentage of the total size of the table.

What are the disadvantages of indexes in Postgres? ›

While indexes are invaluable for improving query performance, there are downsides. Every index added to a table increases the storage space required by the database. Additionally, indexes can slow down write operations such as INSERT, UPDATE, and DELETE, as each operation requires updating the indexes.

When should indexes be avoided? ›

While indexes are designed to improve the efficiency of a database, there are occasions where they should be avoided, for instance:
  1. On small tables.
  2. Tables that receive a lot of big batch updates or inserts.
  3. Columns that have large numbers of null values.
  4. Columns that are frequently manipulated.
Feb 23, 2023

What are the negative effects of creating unnecessary indexes? ›

Creating unnecessary indexes is generally bad. Unnecessary indexes reduce performance and take up space. An index is considered to be unnecessary if (1) it is not frequently used by a query or (2) it is redundant because another compound index covers it.

How to improve performance using index? ›

Creating an ordered structure Indexes create an ordered structure of the table rows so that the storage engine can fetch the table rows in a pre-ordered manner. > Reducing data to scan Indexes can improve query performance by reducing the amount of data that the database engine has to scan, sort, or join.

What is poor indexing? ›

A poor index can be an index created on a column that doesn't provide easier data manipulation or an index created on multiple columns which, rather than speed up queries, slows them down.

Why is PostgreSQL slow? ›

All SQL queries tend to slow down in proportion to the number of rows in the table(s). Your local test probably only has a few rows. The production database server may have thousands or millions of rows in the tables. Your new SQL query needs to be tuned on the same size data as the production database server.

What is the limitation of PostgreSQL? ›

One of the well known limitations of PostgreSQL is its lack of horizontal scalability: Postgres is built to scale up, not out. If you are building an application such as a payment system that will be processing hundreds of thousands of transactions per second, you should prepare for scale.

Why not use PostgreSQL? ›

Using different databases in development/test and production is a huge mistake. Presumably using SQLite in development is easier because you don't have to install and configure a PostgreSQL server but it really only makes things harder when things don't work the same in development, test, and production.

Does indexing reduce performance? ›

The main benefit of database indexes is that they can improve the performance of your queries by reducing the amount of data that the database engine has to scan, sort, or join. This can result in faster response times, lower resource consumption, and better user experience.

What are the pros and cons of index? ›

Index funds are a low-cost way to invest, provide better returns than most fund managers, and help investors to achieve their goals more consistently. On the other hand, many indexes put too much weight on large-cap stocks and lack the flexibility of managed funds.

Why are too many indexes bad? ›

Adding more indexes without careful consideration can lead to performance degradation and increased maintenance overhead. The downside of excessive indexing: Overhead: Each additional index consumes storage space and incurs maintenance costs during data modification operations (e.g., INSERT, UPDATE, DELETE).

When should you not create an index? ›

When Should Indexes Be Avoided?
  1. Indexes should not be used on small tables.
  2. Indexes should not be used on columns that return a high percentage of data rows when used as a filter condition in a query's WHERE clause. ...
  3. Tables that have frequent, large batch update jobs run can be indexed.

Under what conditions should indexes be used? ›

If the data we are looking for is towards the very end, this query would take a long time to run. For such scenarios, indexes are very useful to query the tables and get the results very quickly. If the records of the table would have been sorted then it would have been easier to search the record.

When should I create an index in PostgreSQL? ›

Always create indexes on the most executed and costly queries. Avoid creating an index to satisfy a specific query. As per best practice, always define a primary or unique key in a Postgres table. It automatically creates the B-tree index.

When to use reindex in PostgreSQL? ›

There are several scenarios in which to use REINDEX : An index has become corrupted, and no longer contains valid data. Although in theory this should never happen, in practice indexes can become corrupted due to software bugs or hardware failures. REINDEX provides a recovery method.

Top Articles
Latest Posts
Article information

Author: Van Hayes

Last Updated:

Views: 6156

Rating: 4.6 / 5 (66 voted)

Reviews: 89% of readers found this page helpful

Author information

Name: Van Hayes

Birthday: 1994-06-07

Address: 2004 Kling Rapid, New Destiny, MT 64658-2367

Phone: +512425013758

Job: National Farming Director

Hobby: Reading, Polo, Genealogy, amateur radio, Scouting, Stand-up comedy, Cryptography

Introduction: My name is Van Hayes, I am a thankful, friendly, smiling, calm, powerful, fine, enthusiastic person who loves writing and wants to share my knowledge and understanding with you.