▲ Vercel Integration now GA - Create a database branch for every preview deployment, automatically.Learn here

The pg_partman extension

Manage large Postgres tables using the PostgreSQL Partition Manager extension

pg_partman is a Postgres extension that simplifies the management of partitioned tables. Partitioning refers to splitting a single table into smaller pieces called partitions. This is done based on the values in a key column or set of columns. Even though partitions are stored as separate physical tables, the partitioned table can still be queried as a single logical table. This can significantly enhance query performance and also help you manage the data lifecycle of tables that grow very large.

While Postgres natively supports partitioning, pg_partman helps set up and manage partitioned tables by automating steps like creating new partitions and handling the data lifecycle for a given retention policy.

Try it on Neon!

Neon is Serverless Postgres built for the cloud. Explore Postgres features and functions in our user-friendly SQL editor. Sign up for a free account to get started.

Sign Up

In this guide, we’ll learn how to set up and use the pg_partman extension with your Neon Postgres project. We'll cover why partitioning is helpful, how to enable pg_partman, creating partitioned tables, and automating partition maintenance.

note

pg_partman is an open-source Postgres extension that can be installed in any Neon project using the instructions below. Detailed installation instructions and compatibility information can be found in the pg_partman documentation.

Enable the pg_partman extension

You can enable the extension by running the following CREATE EXTENSION statement in the Neon SQL Editor or from a client such as psql that is connected to Neon.

CREATE EXTENSION IF NOT EXISTS pg_partman;

For information about using the Neon SQL Editor, see Query with Neon's SQL Editor. For information about using the psql client with Neon, see Connect with psql.

Version Compatibility:

pg_partman works with Postgres 14 and above, complementing the native partitioning features introduced in these versions.

Why partition your data?

For tables that grow very large, partitioning offers several benefits:

  • Faster queries: Partitioning allows Postgres to quickly locate and retrieve data within a specific partition, rather than scanning the entire table.
  • Scalability: Partitioning makes database administration simpler. For example, smaller partitions are easier to load and delete or back up and recover.
  • Managing the data lifecycle: Easier management of the data lifecycle by archiving or purging old partitions, which can be moved to cheaper storage options without affecting the active dataset.

Native partitioning vs pg_partman

Postgres supports partitioning tables natively, with the following strategies to divide the data:

  • List partitioning: Data is distributed across partitions based on a list of values, such as a category or location.
  • Range partitioning: Data is distributed across partitions based on ranges of values, such as dates or numerical ranges.

With native partitioning, you need to manually create and manage partitions for your table.

CREATE TABLE measurement (
    city_id         int not null,
    logdate         date not null,
    peaktemp        int
) PARTITION BY RANGE (logdate);

-- Create a partition for each month of logged data
CREATE TABLE measurement_y2006m02 PARTITION OF measurement
    FOR VALUES FROM ('2006-02-01') TO ('2006-03-01');

-- Moving older data to a different table
ALTER TABLE measurement DETACH PARTITION measurement_y2005m10;

pg_partman only supports creating partitions that are number or time-based, with each partition covering a range of values. However, this simplifies the process of creating and managing partitions.

Example: Partitioning user-activity data

Consider a social media platform that tracks user interactions in their website application, such as likes, comments, and shares. The data is stored in a table called user_activities, where activity_type stores the type of activity and the other columns store additional information about the activity.

Setting up a partitioned table

Given the large volume of data generated by user interactions, partitioning the user_activities table can help keep queries manageable. Recent activity data is typically the most interesting for both the platform and its users, so activity_time is a good candidate to partition on.

CREATE TABLE user_activities (
    activity_id serial,
    activity_time TIMESTAMPTZ NOT NULL,
    activity_type TEXT NOT NULL,
    content_id INT NOT NULL,
    user_id INT NOT NULL
)
PARTITION BY RANGE (activity_time);

To create a partition for each week of activity data, you can run the following query:

SELECT create_parent('public.user_activities', 'activity_time', '1 week');

This will create a new partition for each week of data in the user_activities table. We can insert some sample data into the table:

INSERT INTO user_activities (activity_time, activity_type, content_id, user_id)
VALUES
    ('2024-03-15 10:00:00', 'like', 1001, 101),
    ('2024-03-16 15:30:00', 'comment', 1002, 102),
    ('2024-03-17 09:45:00', 'share', 1003, 103),
    ('2024-03-18 18:20:00', 'like', 1004, 104),
    ('2024-03-19 12:10:00', 'comment', 1005, 105),
    ('2024-03-20 08:00:00', 'like', 1006, 106),
    ('2024-03-21 14:15:00', 'share', 1007, 107),
    ('2024-03-22 11:30:00', 'like', 1008, 108),
    ('2024-03-23 16:45:00', 'comment', 1009, 109),
    ('2024-03-24 20:00:00', 'share', 1010, 110),
    ('2024-03-25 09:30:00', 'like', 1011, 111),
    ('2024-03-26 13:45:00', 'comment', 1012, 112),
    ('2024-03-27 17:00:00', 'share', 1013, 113),
    ('2024-03-28 11:15:00', 'like', 1014, 114),
    ('2024-03-29 15:30:00', 'comment', 1015, 115);

Querying partitioned tables

We can query against the user_activities table as if it were a single table, and Postgres will automatically route the query to the correct partition(s) based on the activity_time column.

SELECT * FROM user_activities WHERE activity_time BETWEEN '2024-03-20' AND '2024-03-25';

This query returns the following results:

activity_id |     activity_time      | activity_type | content_id | user_id
-------------+------------------------+---------------+------------+---------
          16 | 2024-03-20 08:00:00+00 | like          |       1006 |     106
          17 | 2024-03-21 14:15:00+00 | share         |       1007 |     107
          18 | 2024-03-22 11:30:00+00 | like          |       1008 |     108
          19 | 2024-03-23 16:45:00+00 | comment       |       1009 |     109
          20 | 2024-03-24 20:00:00+00 | share         |       1010 |     110
(5 rows)

To see the list of all partitions created for the user_activities table, you can run the following query:

SELECT table_name FROM information_schema.tables WHERE table_schema = 'public' AND table_name LIKE 'user_activities_%';

This will return the following results:

table_name
---------------------------
 user_activities_p20240329
 user_activities_p20240405
 user_activities_p20240315
 user_activities_p20240322
 user_activities_p20240412
 user_activities_p20240419
 user_activities_p20240426
 user_activities_default
 user_activities_p20240301
 user_activities_p20240308
(10 rows)

pg_partman automatically created tables for weekly intervals close to the current data. As more data is inserted, it will create new partitions. Additionally, there is a user_activities_default table that stores data that doesn't fit into any of the existing partitions.

Data retention policies

To make sure that old data is automatically removed from the main table, you can set up a retention policy:

UPDATE part_config 
SET retention = '4 weeks', retention_keep_table = true
WHERE parent_table = 'public.user_activities';

The background worker process that comes bundled with pg_partman automatically detaches the old partitions that are older than 4 weeks from the main table. Since we've set retention_keep_table to true, the old partitions are kept as separate tables, and not dropped from the database.

Uniqueness constraints

Postgres doesn't support indexes or unique constraints that span multiple tables. Since a partitioned table is made up of multiple physical tables, you can't create a unique constraint that spans all the partitions. For example, the following query will fail:

ALTER TABLE user_activities ADD CONSTRAINT unique_activity UNIQUE (activity_id);

It returns the following error:

ERROR:  unique constraint on partitioned table must include all partitioning columns
DETAIL:  UNIQUE constraint on table "user_activities" lacks column "activity_time" which is part of the partition key.

However, when the unique constraint involves partition key columns, Postgres can guarantee uniqueness across all partitions. In this way, different partitions cannot share the same values for the partition key columns, which allows unique constraints to be enforced.

For example, including the activity_time column in the unique constraint will work because activity_time is a partition key column:

ALTER TABLE user_activities ADD CONSTRAINT unique_activity UNIQUE (activity_id, activity_time);

Conclusion

By leveraging pg_partman, you can significantly enhance the native partitioning functionality of Postgres, particularly for large-scale and time-series datasets. The extension simplifies partition management, automates retention and archival tasks, and improves query performance.

Reference

Last updated on

Edit this page
Was this page helpful?