AL.
🇪🇸 ES
Back to blog
Databases · 18 min read

PostgreSQL Mistakes You're Probably Making (And How to Fix Them)

A comprehensive guide to avoiding common PostgreSQL pitfalls that can compromise performance, data integrity, and maintainability.


PostgreSQL is one of the most powerful and reliable relational databases available today. It offers an extraordinary range of features, from advanced indexing to full-text search to JSON support. But with that power comes complexity, and even experienced developers fall into traps that can silently degrade performance, corrupt data, or create maintenance nightmares down the road.

This guide covers the most common PostgreSQL anti-patterns, explains why they are harmful, and shows you the correct approach for each one. Whether you are building a new application or maintaining an existing one, auditing your codebase against this list is well worth the effort.

Key Takeaways

  • Data Types: Choosing the wrong data type (e.g., char(n), money, serial, timestamp without time zone) can lead to subtle bugs and data integrity issues. Prefer text, numeric, identity columns, and timestamptz.
  • Queries: Certain SQL constructs like NOT IN with nullable columns and BETWEEN for timestamp ranges produce surprising results. Use NOT EXISTS and explicit range comparisons instead.
  • Schema Design: Avoid PostgreSQL Rules and Table Inheritance. Use triggers and foreign keys or native partitioning as alternatives.
  • Security: Never use trust authentication over TCP/IP. Always use scram-sha-256 for password-based authentication.

Database Encoding

The Problem

When creating a PostgreSQL database, you can specify the character encoding. One option is SQL_ASCII, which is effectively “no encoding at all.” It does not validate or convert character data in any way. This means your database will happily accept and store a mix of UTF-8, Latin-1, Windows-1252, and any other encoding, often within the same column. Once you have mixed encodings in your data, there is no reliable way to sort, search, or compare text correctly. Functions like lower(), upper(), and regular expression matching will produce incorrect results or outright errors.

What to Do Instead

Always create your databases with UTF-8 encoding:

CREATE DATABASE myapp
  ENCODING 'UTF8'
  LC_COLLATE 'en_US.UTF-8'
  LC_CTYPE 'en_US.UTF-8'
  TEMPLATE template0;

UTF-8 can represent every character in the Unicode standard, handles multilingual data correctly, and is the de facto standard for modern applications.

The Exception

There is essentially no valid reason to use SQL_ASCII in a new project. The only scenario where you might encounter it is when migrating a legacy database that already contains mixed-encoding data, and even then your goal should be to clean the data and convert to UTF-8 as soon as possible.


Tool Usage

The Problem

Many developers connect to PostgreSQL using psql -W or psql --password. The -W flag forces psql to prompt for a password before attempting to connect. The issue is that PostgreSQL may not even require a password for your connection (depending on pg_hba.conf settings). If it does not, you type your password for nothing. If it does, psql will prompt you automatically anyway. Using -W provides no benefit and can create confusion about whether authentication is actually configured.

What to Do Instead

Simply run psql without the -W flag:

psql -h localhost -U myuser -d mydb

If a password is required, psql will prompt you. If it is not required, you connect directly. This way you always get accurate feedback about your authentication setup.

The Exception

The -W flag can save one round-trip to the server if you know a password will be required and you want to avoid the initial connection attempt. In practice, this is negligible. If you need non-interactive authentication, use a .pgpass file or the PGPASSWORD environment variable (though the latter is discouraged for security reasons).


Query Rewriting with Rules

The Problem

PostgreSQL Rules (the CREATE RULE system) rewrite queries before they are executed. While this may sound convenient, rules are notoriously difficult to reason about. They interact badly with other features, can silently change the behavior of your queries, and produce confusing results when combined with RETURNING clauses, CTEs, or triggers. The PostgreSQL documentation itself discourages their use in most scenarios.

For example, a rule that redirects inserts to a different table can silently drop data if the rule does not account for all columns, and debugging this is extremely difficult because the query that actually runs is different from the one you wrote.

What to Do Instead

Use triggers for any logic that needs to run automatically in response to data changes:

CREATE OR REPLACE FUNCTION log_update()
RETURNS trigger AS $$
BEGIN
  INSERT INTO audit_log (table_name, record_id, changed_at)
  VALUES (TG_TABLE_NAME, NEW.id, now());
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trg_log_update
  AFTER UPDATE ON orders
  FOR EACH ROW
  EXECUTE FUNCTION log_update();

Triggers are explicit, well-understood, and compose correctly with the rest of PostgreSQL’s features.

The Exception

The one place where rules are still commonly used is for updatable views, though even there, INSTEAD OF triggers on views are the preferred modern approach.


Table Inheritance

The Problem

PostgreSQL supports table inheritance, where a child table inherits columns from a parent table. While this sounds like a natural fit for object-oriented modeling, it has significant limitations:

  • Unique constraints and foreign keys do not span the inheritance hierarchy.
  • A SELECT on the parent table includes rows from all children, which can be surprising.
  • Indexes on the parent do not cover child tables.
  • Many ORM tools do not support inheritance well.

These limitations mean that table inheritance frequently leads to data integrity issues that are only discovered in production.

What to Do Instead

Use standard relational design with foreign keys:

CREATE TABLE vehicles (
  id integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  make text NOT NULL,
  model text NOT NULL,
  year integer NOT NULL
);

CREATE TABLE trucks (
  vehicle_id integer PRIMARY KEY REFERENCES vehicles(id),
  payload_capacity numeric NOT NULL,
  bed_length numeric NOT NULL
);

If your goal is partitioning large tables for performance, use PostgreSQL’s native declarative partitioning, which was introduced in version 10 and improved significantly in later releases:

CREATE TABLE events (
  id integer GENERATED ALWAYS AS IDENTITY,
  event_time timestamptz NOT NULL,
  payload jsonb
) PARTITION BY RANGE (event_time);

CREATE TABLE events_2024 PARTITION OF events
  FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');

CREATE TABLE events_2025 PARTITION OF events
  FOR VALUES FROM ('2025-01-01') TO ('2026-01-01');

The Exception

Table inheritance is still used internally by the partitioning system, and some legacy applications rely on it. For new designs, there is no compelling reason to use inheritance directly.


NOT IN with Subqueries

The Problem

NOT IN is one of the most dangerous constructs in SQL when used with a subquery, because of how it handles NULL values. If the subquery returns even a single NULL, the entire NOT IN condition evaluates to NULL (which is falsy), and zero rows are returned.

-- This returns ZERO rows if bar.x contains any NULL values
SELECT * FROM foo WHERE col NOT IN (SELECT x FROM bar);

To illustrate why, consider:

SELECT * FROM foo WHERE col NOT IN (1, NULL);

For each row, PostgreSQL evaluates: col != 1 AND col != NULL. Since col != NULL is always NULL (unknown), the entire expression is NULL, and no rows pass the filter. This behavior is correct according to the SQL standard, but it surprises nearly everyone who encounters it.

What to Do Instead

Use NOT EXISTS, which handles NULL values correctly:

SELECT *
FROM foo
WHERE NOT EXISTS (
  SELECT 1 FROM bar WHERE foo.col = bar.x
);

NOT EXISTS returns TRUE when the subquery produces no rows, regardless of NULL values. It is also often faster because the optimizer can stop evaluating the subquery as soon as it finds a match.

The Exception

NOT IN is safe when used with a hardcoded list of non-null values (e.g., WHERE status NOT IN ('active', 'pending')). But if there is any chance the values could be NULL, or if the list comes from a subquery, always use NOT EXISTS.


Uppercase Table and Column Names

The Problem

PostgreSQL folds unquoted identifiers to lowercase. If you create a table with uppercase or mixed-case names, you must quote them everywhere:

CREATE TABLE "Users" ("FirstName" text, "LastName" text);

-- Every query must use double quotes
SELECT "FirstName" FROM "Users" WHERE "LastName" = 'Smith';

This is tedious, error-prone, and makes every query harder to read and write. Forgetting a single set of quotes produces a “column does not exist” error that can be baffling if you do not know about identifier folding.

What to Do Instead

Use lowercase_with_underscores for all identifiers:

CREATE TABLE users (
  first_name text,
  last_name text
);

-- Clean, simple, no quoting needed
SELECT first_name FROM users WHERE last_name = 'Smith';

This convention is idiomatic PostgreSQL, works without quoting, and aligns with how most PostgreSQL tools and libraries expect names to be formatted.

The Exception

If you are interfacing with a system that requires specific casing (e.g., a legacy application that expects CamelCase column names), you may be forced to use quoted identifiers. Minimize this as much as possible.


BETWEEN for Timestamps

The Problem

BETWEEN is inclusive on both ends. When used with timestamps, this creates subtle off-by-one bugs:

-- BAD: This includes midnight of June 8 but excludes the rest of that day
SELECT * FROM events
WHERE event_time BETWEEN '2018-06-01' AND '2018-06-08';

The upper bound '2018-06-08' is cast to '2018-06-08 00:00:00', so events on June 8 after midnight are excluded. If you change the upper bound to '2018-06-08 23:59:59', you still miss events in the last second of the day. Using '2018-06-08 23:59:59.999999' is fragile and depends on the timestamp precision.

What to Do Instead

Use explicit range comparisons with a half-open interval:

-- GOOD: includes all of June 1 through all of June 7
SELECT * FROM events
WHERE event_time >= '2018-06-01'
  AND event_time < '2018-06-08';

The half-open interval (>= start, < end) is unambiguous, works correctly regardless of timestamp precision, and is the standard approach in most database applications.

The Exception

BETWEEN is fine for discrete types like integer or date (not timestamp), where there are no fractional values between the bounds.


Date and Time Storage

Use timestamptz, Not timestamp

The Problem: timestamp (also known as timestamp without time zone) stores a date and time with no timezone information. This means PostgreSQL has no idea whether '2024-03-15 14:30:00' is in UTC, Eastern Time, or Tokyo Time. If your application servers are in different time zones, or if a server’s timezone changes, your data becomes silently incorrect.

What to Do Instead: Always use timestamptz (timestamp with time zone):

CREATE TABLE orders (
  id integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  created_at timestamptz NOT NULL DEFAULT now(),
  shipped_at timestamptz
);

timestamptz stores the value in UTC internally and converts to/from the session’s timezone on input and output. This ensures consistent behavior regardless of server or client timezone settings.

Don’t Store UTC in timestamp Columns

The Problem: A common workaround is to “always store UTC” in a timestamp (without timezone) column. The issue is that PostgreSQL does not know the value is UTC. Any timezone-aware function (like AT TIME ZONE) will treat the value according to the session’s timezone setting, producing incorrect results. You are also relying on every developer and every application to consistently store UTC, with no enforcement from the database.

What to Do Instead: Use timestamptz. It stores UTC internally and handles conversions automatically.

Never Use timetz

The Problem: timetz (time with time zone) stores a time of day with a timezone offset. This is almost never useful because a timezone offset without a date is meaningless. Daylight saving time rules depend on the date, so a time zone offset for “3:00 PM” is ambiguous without knowing which day it is. The PostgreSQL documentation explicitly states that timetz exists only for SQL standard compliance and should not be used.

What to Do Instead: Use timestamptz if you need to store a point in time. If you genuinely need just a time of day (e.g., “the store opens at 9:00 AM”), use time (without timezone) and handle timezone conversions in your application.

Don’t Use CURRENT_TIME

The Problem: CURRENT_TIME returns a timetz value, which as discussed above is almost never what you want.

What to Do Instead: Use CURRENT_TIMESTAMP, now(), or clock_timestamp():

-- Transaction start time (same value throughout the transaction)
SELECT now();
SELECT CURRENT_TIMESTAMP;

-- Wall clock time (changes within a transaction)
SELECT clock_timestamp();

Don’t Use timestamp(0) or timestamptz(0)

The Problem: The (0) precision modifier rounds timestamps to the nearest second. This sounds harmless, but rounding can push a timestamp forward into the next second, minute, hour, or even day. A value of '2024-01-01 23:59:59.7' gets rounded to '2024-01-02 00:00:00', which is a completely different day.

What to Do Instead: Store full-precision timestamps and round in your application or queries when needed:

-- Store full precision
CREATE TABLE logs (
  created_at timestamptz NOT NULL DEFAULT now()
);

-- Round when displaying
SELECT date_trunc('second', created_at) FROM logs;

Don’t Use +/-HH:mm Offsets as Timezone Names

The Problem: Using fixed offsets like '+05:30' or '-04:00' as timezone identifiers ignores daylight saving time and other historical timezone changes. They represent a fixed offset at one point in time, not a geographic timezone with its full set of rules.

What to Do Instead: Use IANA timezone names:

-- BAD
SET timezone = '+05:30';

-- GOOD
SET timezone = 'Asia/Kolkata';

IANA timezone names (e.g., 'America/New_York', 'Europe/London', 'Asia/Tokyo') encode the complete history of UTC offset changes, including daylight saving time transitions.


Text Storage

Never Use char(n)

The Problem: char(n) pads values with trailing spaces to fill the declared length. This wastes storage, causes confusing comparison behavior (trailing spaces may or may not be significant depending on the operation), and is not faster than text or varchar. PostgreSQL stores char(n) and text the same way internally, so there is zero performance benefit.

-- char(10) stores 'hello     ' (padded with 5 spaces)
-- This wastes space and creates comparison surprises
SELECT 'hello'::char(10) = 'hello     '; -- true
SELECT length('hello'::char(10));          -- 5 (trailing spaces stripped by length())

What to Do Instead

Use text for all string columns:

CREATE TABLE customers (
  name text NOT NULL,
  email text NOT NULL
);

Don’t Use char(n) Even for Fixed-Length Identifiers

The Problem: Developers often reach for char(3) for things like country codes or currency codes, reasoning that the length is always exactly 3. But char(n) does not validate the length, it pads shorter values with spaces. If someone inserts 'US' into a char(3) column, it stores 'US ' without complaint.

What to Do Instead: Use text with a CHECK constraint that enforces both the format and the length:

CREATE TABLE countries (
  code text CHECK (code ~ '^[A-Z]{3}$'),
  name text NOT NULL
);

This is strictly better: it rejects invalid values (too short, too long, wrong characters) while char(3) would silently pad or truncate.

Think Before Using varchar(n)

The Problem: varchar(n) is better than char(n) because it does not pad, but the length limit is often arbitrary and causes problems later. When you define varchar(255), you are not gaining any performance advantage over text. PostgreSQL handles them identically in terms of storage and performance. But you are creating a constraint that may need to be changed later, and altering the column length requires a table rewrite on older PostgreSQL versions.

What to Do Instead: Use text with CHECK constraints when you need validation:

CREATE TABLE users (
  username text NOT NULL CHECK (length(username) BETWEEN 3 AND 50),
  bio text CHECK (length(bio) <= 10000)
);

CHECK constraints can be added or removed without rewriting the table, and they can enforce more complex rules than a simple length limit.

The Exception

If you are building a database that must comply with a specific standard or specification that mandates varchar(n) (e.g., certain financial or healthcare data standards), then use it. But for general application development, text with constraints is the better choice.


Other Data Types

Don’t Use money

The Problem: The money type in PostgreSQL has several serious issues:

  • Its precision and formatting depend on the lc_monetary locale setting, which can differ between servers or sessions.
  • Transferring money data between databases with different locale settings can silently change values.
  • It has limited precision (fixed to the locale’s currency decimal places).
  • Arithmetic with money can produce surprising results.

What to Do Instead

Use numeric with explicit precision for monetary amounts:

CREATE TABLE transactions (
  id integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  amount numeric(19,4) NOT NULL,
  currency text NOT NULL CHECK (currency ~ '^[A-Z]{3}$'),
  created_at timestamptz NOT NULL DEFAULT now()
);

numeric(19,4) gives you up to 15 digits before the decimal point and 4 digits after, which is sufficient for virtually any financial application. By storing the currency code separately, you make the system locale-independent.

The Exception

If you are building a quick prototype or internal tool where you will never transfer data between systems and the locale will never change, money might save you a few keystrokes. But for anything that matters, use numeric.

Use Identity Columns Instead of serial

The Problem: serial and bigserial are PostgreSQL-specific pseudo-types that create a sequence and set a default value, but they have several drawbacks:

  • The sequence is not truly “owned” by the column in a way that prevents other tables from using it.
  • Permissions on the sequence must be managed separately.
  • serial does not prevent manual inserts of arbitrary values, which can cause sequence conflicts.
  • serial is not SQL standard compliant.

What to Do Instead

Use identity columns, which were introduced in PostgreSQL 10:

CREATE TABLE users (
  id integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  username text NOT NULL,
  email text NOT NULL
);

GENERATED ALWAYS AS IDENTITY ties the sequence tightly to the column, prevents manual value insertion by default (you must explicitly use OVERRIDING SYSTEM VALUE to bypass it), and follows the SQL standard. Use GENERATED BY DEFAULT AS IDENTITY if you need to allow manual inserts occasionally.

The Exception

If you are maintaining a legacy codebase that already uses serial extensively, there is no urgent need to migrate existing tables. But for all new tables, use identity columns.


Authentication

Never Use trust Authentication Over TCP/IP

The Problem: The trust authentication method in pg_hba.conf allows anyone to connect without a password. When configured for TCP/IP connections (host entries), this means anyone who can reach the PostgreSQL port on the network can connect as any user, including the superuser, and do anything they want: read all data, modify it, drop databases, or even execute operating system commands.

# DANGEROUS: Anyone on the network can connect as any user
host    all    all    0.0.0.0/0    trust

This is the database equivalent of leaving your front door open with a sign that says “take whatever you want.”

What to Do Instead

Use scram-sha-256 authentication for all TCP/IP connections:

# pg_hba.conf
host    all    all    0.0.0.0/0    scram-sha-256

Set the password encryption method in postgresql.conf:

password_encryption = scram-sha-256

Then set strong passwords for all database users:

ALTER USER myuser WITH PASSWORD 'a-strong-randomly-generated-password';

scram-sha-256 is the most secure password-based authentication method available in PostgreSQL. It prevents password sniffing and replay attacks.

The Exception

trust authentication is acceptable for local (Unix socket) connections on a development machine where you are the only user. It should never be used for TCP/IP connections in any environment, including development, because other processes or users on the network could exploit it.


Wrap Up

PostgreSQL gives you an enormous amount of power, but it also gives you enough rope to create serious problems if you are not careful. The anti-patterns covered in this guide are not hypothetical: they show up regularly in production systems and cause real data integrity issues, performance problems, and security vulnerabilities.

Here is a quick checklist you can use when reviewing your PostgreSQL setup:

  • Encoding: UTF-8, not SQL_ASCII.
  • Time types: timestamptz, not timestamp. Never timetz.
  • Text types: text with CHECK constraints, not char(n) or arbitrary varchar(n).
  • Numeric money: numeric(19,4), not money.
  • Primary keys: Identity columns, not serial.
  • Queries: NOT EXISTS instead of NOT IN with subqueries. Explicit ranges instead of BETWEEN for timestamps.
  • Schema: Triggers instead of Rules. Foreign keys instead of Table Inheritance.
  • Authentication: scram-sha-256 over TCP/IP, never trust.

Fixing these issues is usually straightforward, but finding them requires knowing what to look for. Bookmark this guide and refer back to it when designing new tables, writing queries, or reviewing pull requests. Your future self and your team will thank you.