ANSI, Schmansi!

How I learned to stop worrying
and love Postgres-isms


PostgresConf US 2018

April 19, 2018
Jersey City, NJ

About Me

  • Sehrope Sarkuni
  • Founder of JackDB, Inc. (jackdb.com)
  • Contributor to pgjdbc (Java driver)
  • Contributor to node-postgres (Node.js driver)
  • Occasionally gives great advice on #postgresql

The SQL Standard

  • SQL became an ANSI standard in 1986.
  • SQL:2011 and SQL:2016 are the two most recent revisions to the standard.
  • PostgreSQL supports most of SQL:2011 standard.

The SQL Standard

Revisions

  • SQL:2003 (4th revision) - XML (Native XML type)
  • SQL:2006 (5th revision) - XML (SQL/XML)
  • SQL:2007 (6th revision) - MERGE / TRUNCATE
  • SQL:2011 (7th revision) - Temporal databases
  • SQL:2016 (8th revision) - JSON, Row pattern recognition

Database Agnostic

Historical Reasons

  • Licensing costs of target platform
  • Avoiding perceived or actual lock in
  • Deployment uncertainty

Database Agnostic

Limitations & Issues

  • Weak typing — "stringly" typed UUIDs
  • Weak constraints — CHECK and UNIQUE KEY
  • Basic SQL — SELECT, UPDATE, DELETE by id
  • Basic operators — =, <, >, and LIKE
  • "But my ORM solves this!" — Not really...

Database Specific

Why?

  • Use newer features today
  • Leverage full power of your database
  • Dev <> Prod parity

PostgreSQL is...

  • The most advanced open source database
  • An object relational database system
  • Featureful and standards compliant

PostgreSQL and the SQL Standard

"PostgreSQL development aims for conformance with the latest official version of the standard where such conformance does not contradict traditional features or common sense".

https://www.postgresql.org/docs/current/static/features.html

Beyond ANSI SQL

  • Many useful extensions
  • Many non-standard features
  • Some are simply a different syntax
  • Some make hard things easy
  • Some make impossible things possible

Everyday SQL

  • SELECT ... FROM
  • Casting via ::
  • Pattern Matching
  • Dollar Quoting
  • MVCC / Transaction Isolation

SELECT ... FROM

No need to specify a relation for SELECT queries

-- Look ma, no FROM!
SELECT 'test' AS x;  x   
------
test
(1 row)

Casting via ::

Less typing and easy on the eyes

-- Using CAST(...) syntax:
SELECT CAST('2018-04-19' AS date)

-- Using :: syntax
SELECT '2018-04-19'::date

https://www.postgresql.org/docs/current/static/sql-expressions.html#SQL-SYNTAX-TYPE-CASTS

Regular Expressions

  • More powerful than LIKE
  • Great for CHECK constraints
  • Indexable via pg_trgm contrib extension
  • Lots of related regexp_ functions
SELECT *
FROM conference c
WHERE c.year = 2018
  AND c.name ~* '(postgresql)|(pgsql)'

https://www.postgresql.org/docs/current/static/pgtrgm.html

Dollar Quoting

Simplifies quoting literals that include quotes

-- Regular single quotes:
SELECT 'This is a literal with a '' single quote';

-- Dollar quotes:
SELECT $$This is a literal with a ' single quote$$;

-- Tagged dollar quotes:
SELECT $tag$I can even include $$ in $$ quotes$tag$;

Dollar Quoting

Great hack for importing a block of text as data[1]

SELECT
  t.fields[1] AS title,
  t.fields[2]::date AS start_date,
  t.fields[3]::date AS end_date,
  t.fields[4] AS city
FROM (
  SELECT regexp_split_to_array(line, ',') AS fields
  FROM regexp_split_to_table(
$$PGConf US,2016-04-18,2016-04-20,New York City
PostgresConf US 2017,2017-03-28,2017-03-31,Jersey City
PostgresConf US 2018,2018-04-16,2018-04-20,Jersey City$$
       , '\n') AS line) t

[1]: For a one-off data munge... Don't do this in an actual app.

MVCC / Transaction Isolation

Read Committed — A Sane Default

  • Readers don't block writers
  • Writers don't block readers
  • Usually what you want

https://www.postgresql.org/docs/current/static/transaction-iso.html

Smarter SQL

  • generate_series(...)
  • LATERAL
  • Filtered Aggregates

generate_series(...)

The best way to bootstrap sample data

-- Without a built-in generator:
SELECT x
FROM (SELECT ROWNUM AS x
      FROM some_big_table_i_hope_is_big_enough) t;
WHERE x < 1000;

-- With built-in generate_series(...)
SELECT x
FROM generate_series(1, 100) x;
 x  
-----
   1
   2
   3
  ...
(100 rows)

generate_series(...)

Combine with built-in operators for other types

SELECT now()::date - x AS some_date
FROM generate_series(1, 100) x;
some_date  
------------
 2018-04-18
 2018-04-17
 2018-04-15
 2018-04-16
 ...
(100 rows)

LATERAL

Like scalar subqueries but with more than one column

-- Most recent purchase per customer
SELECT
  c.id,
  c.email,
  p.transaction_date,
  p.total_price
FROM customer c
  LEFT JOIN LATERAL (
    SELECT
    FROM purchase p
    -- Filter on this specific customer's purchases:
    WHERE p.customer_id = c.id
    -- Get only the most recent transaction
    ORDER BY p.transaction_date DESC
    LIMIT 1) p ON true

Filtered Aggregates

-- Workaround with CASE ... WHEN ... END
SELECT
  SUM(CASE
        WHEN c.type = 'postgres' THEN 1 ELSE 0
      END) AS num_postgres,
  SUM(CASE
        WHEN c.type != 'postgres' THEN 1 ELSE 0
      END) AS num_non_postgres
FROM conference c

-- With filtered aggregates
SELECT
  COUNT(*) FILTER (WHERE c.type = 'postgres')
    AS num_postgres,
  COUNT(*) FILTER (WHERE c.type != 'postgres')
    AS num_non_postgres
FROM conference c
Part of SQL:2003 standard yet only supported by PostgreSQL

Data Modeling

  • Core Data Types
  • Domains & Custom Types
  • Partial Indexes
  • Expression Indexes

Core Data Types

PostgreSQL includes many data types

  • boolean
  • text
  • uuid
  • date
  • timestamp / timestamptz
  • json / jsonb
  • inet / cidr
  • many many more...

https://www.postgresql.org/docs/current/static/datatype.html

Data Types — Boolean

Simplify filtering in WHERE clauses

-- Using ints 1/0 for true/false
SELECT *
FROM foo f
WHERE f.is_active = 1

vs.

-- Using an actual boolean type
SELECT *
FROM foo f
WHERE f.is_active

Data Types — Boolean

Better signal-to-noise ratio for UPDATEs

-- Using ints 1/0 for true/false
UPDATE foo
SET
  is_active =
    CASE WHEN bar = 'xyz'
      THEN 1
      ELSE 0
    END

vs.

-- Using an actual boolean type
UPDATE foo
SET is_active = (bar = 'xyz')

Data Types — text

  • Empty string (i.e. '') are not NULL
    => SELECT '' IS NULL;
    ?column? 
    ----------
    f
  • No size limits in practice [1]
  • Many character sets / server encodings / client encodings [2]

[1]: Limited to 1 GB. If you get close to that size you've got other problems...

[2]: Though you probably should just use UTF-8

Data Types — uuid

  • Added in version 8.3 (released 2008!)
  • Stored as 16-bytes (not as a string[1])
  • Generate within app or in-database
=> SELECT pgcrypto.gen_random_uuid();
          gen_random_uuid            
--------------------------------------
380a168a-7309-4f43-a3de-d89a86fd2284

[1]: That's a >50% storage savings which translates to more data fitting in the cache.

Data Types — json / jsonb

  • NoSQL on ACID
  • json added in 9.2 (released 2012)
  • json becomes useful in 9.3 (released 2013)
  • jsonb added in 9.4 (released 2014)
=> SELECT t.data->>'foo'
    FROM (SELECT '{"foo": 123}'::json AS data) t;
?column? 
----------
123
(1 row)

Data Types — json / jsonb

row_to_json(...) and json_agg(...)

SELECT
  p.id,
  p.name,
  p.type,
  (SELECT
      -- Convert NULL to empty list:
      COALESCE(json_agg(t.*), '[]'::json)
    FROM (
      -- Sub-SELECT to get subset of keys we want:
      SELECT 
        c.id,
        c.name
      FROM child c
      WHERE c.parent_id = p.id) t) AS children
FROM parent p

Data Types — json / jsonb

row_to_json(...) and json_agg(...)

 id | name |  type  |        children        
----+------+--------+------------------------
  1 | foo  | wizzle | [{"id":1,"name":"a"}, +
    |      |        |  {"id":2,"name":"b"}, +
    |      |        |  {"id":3,"name":"c"}]
  2 | bar  | wizzle | [{"id":4,"name":"d"}]
  3 | baz  | wozzle | []
(3 rows)

Only one round-trip from app to database.

Data Types — Domains

  • Subset of an existing type
  • Adds CHECK constraints and defaults
-- Account ids start with an "A" followed by 8 digits
CREATE DOMAIN account_id_type AS text
  CONSTRAINT account_id_type_ck_format
    CHECK (VALUE ~ '^A[0-9]{8}$');

-- Usernames must start with a letter and be alphanumeric
CREATE DOMAIN app_username AS text
  CONSTRAINT app_username_ck_valid
    CHECK (VALUE ~ '^[a-z][a-z0-9]{2,254}$');

-- Person ids start with an "P" followed by 15 digits
CREATE DOMAIN person_id_type AS text
  CONSTRAINT person_id_type_ck_format
    CHECK (VALUE ~ '^P[0-9]{15}$');

Data Types — Domains

Define once then reuse in multiple places

CREATE TABLE person (
  id            person_id_type PRIMARY KEY,
  username      app_username,
  name          text NOT NULL
);

CREATE TABLE account (
    id          account_id_type PRIMARY KEY,
    owner_id    person_id_type NOT NULL REFERENCES person (id),
    balance     numeric NOT NULL
);

Partial Indexes

Index and constrain only a subset of a table

CREATE TABLE menu_item (
  id                        uuid PRIMARY KEY,
  menu_date                 date NOT NULL,
  name                      text NOT NULL,
  price                     numeric NOT NULL,
  is_daily_special          boolean NOT NULL DEFAULT false
);

CREATE INDEX menu_item_ix_menu_date_name
  ON menu_item (menu_date, name);

-- Only allow one item to be the daily special per day
CREATE UNIQUE INDEX menu_item_uk_menu_date_is_daily_special
  ON menu_item (menu_date)
  WHERE is_daily_special;

Functional Indexes

Allows for indexing an expression

CREATE UNIQUE INDEX foo_ix_lower_name
  ON foo (LOWER(name));
=> EXPLAIN SELECT * FROM foo WHERE LOWER(name) = 'abc';
                                  QUERY PLAN           
-------------------------------------------------------
Index Scan using foo_ix_lower_name on foo
  (cost=0.43..8.45 rows=1 width=18)
  Index Cond: (lower(name) = 'abc'::text)

Schema and Data migrations

  • Transactional DDL
  • COPY
  • Template databases

Transactional DDL

  • All or nothing schema migrations[1]
  • Simultaneous DDL and data migrations
  • Re-test in dev via ROLLBACK
BEGIN;
CREATE TABLE foo (/* ... */);
CREATE INDEX foo_ix_id ON foo (/* ... */);
INSERT INTO foo (/* ... */);
CREATE TABLE bar (/* ... */);
CREATE INDEX bar_ix_id ON bar (/* ... */);
CREATE INDEX bar_ix_foo_id ON bar (/* ... */);
INSERT INTO bar (/* ... */);
COMMIT; -- Or ROLLBACK!

[1]: Some limitations such as altering enums but most DDL is transactional.

COPY

The fastest way to export data

COPY (
  SELECT s.*
  FROM some_huge_table s
  WHERE s.foo = 'bar'
    AND s.baz = 123
) TO STDOUT
WITH (FORMAT CSV)

Template Databases

  • All databases are copies from a template
  • Default template is "template1"
  • Great for testing destructive changes

First create a backup:

CREATE DATABASE app_backup
  TEMPLATE app
  OWNER app_owner;

Template Databases

  • Run some schema migrations
  • Run your app
  • Realize that you broke everything
  • Restore from the template backup
  • Repeat!
-- Drop the current app database:
DROP DATABASE app;
-- Create it again from our backup:
CREATE DATABASE app
  TEMPLATE app_backup
  OWNER app_owner;

Making Hard Things (REALLY) Easy

Hotel Bookings

How do we prevent two customers from booking the same room?

Hotel Bookings

Using vanilla SQL

  • Use two date columns: start and end
  • Add CHECK: (start < end)
  • Add UNIQUE: (room_id, start)
  • ???
  • Serialize via  SELECT FOR UPDATE
  • Check for overlaps in app code
  • Hope that nothing else modifies the data

Hotel Bookings

Using Range Types / Exclusion

CREATE TABLE reservation (
    id              bigserial PRIMARY KEY,
    room_id         bigint NOT NULL REFERENCES room (id),
    booking_dates   daterange NOT NULL,
    -- Prohibit overlapping bookings for the same room:
    CONSTRAINT reservation_uk_room_id_booking_dates
      EXCLUDE USING gist
        (room_id WITH =, booking_dates WITH &&),
    -- Ensure that the daterange are half open
    CONSTRAINT reservation_ck_booking_dates_valid
      CHECK (        lower_inc(booking_dates)
             AND NOT upper_inc(booking_dates))
);

Available since v9.2 (released in 2012!)

Range Types / Exclusion

-- [123456789 123456789 123456789 ]
-- [*********-                    ]
-- [         ********-            ]
-- [    **********-               ]
INSERT INTO reservation (room_id, booking_dates)
  VALUES (1, '[2018-01-01,2018-01-10)');
INSERT INTO reservation (room_id, booking_dates)
  VALUES (1, '[2018-01-10,2018-01-18)');
INSERT INTO reservation (room_id, booking_dates)
  VALUES (1, '[2018-01-05,2018-01-15)');
ERROR:  23P01: conflicting key value violates exclusion
  constraint "reservation_uk_room_id_booking_dates"
DETAIL:  Key (room_id, booking_dates)=
  (1, [2018-01-05,2018-01-15)) conflicts with existing
  key (room_id, booking_dates)=(1, [2018-01-01,2018-01-10)).
SCHEMA NAME:  public
TABLE NAME:  reservation
CONSTRAINT NAME:  reservation_uk_room_id_booking_dates
LOCATION:  check_exclusion_or_unique_constraint, execIndexing.c:836

Making The Impossible Possible

Combining Datasets

Say we have:

  • Primary DB for core data
  • Billing history in a separate DB
  • Website analytics in a third DB
  • User sessions (Redis)

How to get a live view of all of this together?

Foreign Data Wrappers

  • Query remote tables as if they're local
  • Turns your database into a data hub
  • Support for RDBMS and non-RDBMS
  • Extensions exist for pure APIs too

postgres_fdw

CREATE EXTENSION postgres_fdw;
CREATE SERVER billing_db
   FOREIGN DATA WRAPPER postgres_fdw
   OPTIONS ( /* Server info for billing database */ );
CREATE USER MAPPING FOR public
  SERVER billing_db
  OPTIONS ( /* Credentails for billing database */ );
CREATE FOREIGN TABLE billing_transaction (
    id                  uuid NOT NULL,
    customer_id         uuid NOT NULL,
    created_at          timestamptz NOT NULL,
    amount              numeric NOT NULL
  ) SERVER billing_db;

redis_fdw

CREATE EXTENSION redis_fdw;
CREATE SERVER redis_server
  FOREIGN DATA WRAPPER redis_fdw
  OPTIONS ( /* Server details for Redis */ );
CREATE USER MAPPING FOR PUBLIC
  SERVER redis_server
  OPTIONS ( /* Credentials for Redis */ );
CREATE FOREIGN TABLE redis_session (
    key text,
    value text)
  SERVER redis_server;

Join Them All!

SELECT
  /* ... Customer details ... */
  bt.total_amount,
  (s.detail->>'created_at')::timestamptz AS session_created_at,
  (s.detail->>'page_views')::int AS page_views
FROM (SELECT
        bt.customer_id,
        SUM(amount) AS total_amount
      FROM billing_transaction bt
      WHERE br.created_at > (now() - '30 days'::interval) bt
  JOIN customer c ON c.id = bt.customer_id,
  LATERAL (SELECT ((s.value)::json) AS detail
           FROM redis_session s
           WHERE s.key = ('customer:' || c.id)) s
WHERE (s.detail->>'page_views')::int > 10

Summary

Don't crawl on N databases...

when you can run on one!

Q&A

Sehrope Sarkuni

sehrope@jackdb.com

Thank You!