2014-10-02

Track Date-Time Of Row Creation & Modification In Postgres

Here is some example code for using Postgres to automatically track when a row is added to a table and when a row is modified.

Solution requires three steps.

STEP 1 — Columns

First, create a column of type `TIMESTAMP WITH TIME ZONE`. Do not abbreviate to `TIMESTAMP` as that is defined by the SQL spec as `TIMESTAMP WITHOUT TIME ZONE` which is almost certainly not what you want, as explained here by Postgres expert David E. Wheeler.

Name the column exactly the same on each table you want updated. To track creation and mod date-times, I use the names:
  • row_created_
  • row_modified_
In my own names, I avoid abbreviations. Also, I always include a trailing underscore to avoid collision with reserved keywords, as suggested by the SQL spec. 

Those columns are marked `NOT NULL`. Each has a `DEFAULT` of the current moment, set by calling `CLOCK_TIMESTAMP()`. Note that Postgres has three kinds of "now": (1) The actual current moment read from the computer’s clock, (2) When the statement started execution, (3) When the transaction started execution. One could arguably choose any of those three for their own creation and mod values.

You may or may not want a default value for your `row_modified_` column. Some people argue for the precision of the semantics that a new record has not yet been modified. My counter-argument is two-fold: (a) That means allowing NULL values, and I am of the camp believing NULL to be the work of the devil, (b) In my experience, I rarely look at creation date-time, but just scan the mod column where I then find missing values (nulls) to be distracting/confusing.

STEP 2 — Function

Secondly, define a function to generically update any table as long as that table has a column named exactly as we expect.

A "function" is an old-fashioned name for a chunk of server-side code to be executed at run-time. We define this method as a database object, designating a name and so on just like a table or column.

To write a function, we need a programming language more powerful/flexible than SQL. Postgres is capable of running any number of programming languages on the server side, including Java, Perl, Python, and so on. But one language was created expressly for use within Postgres: PL/pgSQL. This language is nearly always included with any Postgres installation ("cluster" in Postgres lingo) whereas the other languages may not be installed by default. Note how our code below declares the language of the function.

The keyword `NEW` provides the generic ability we need to address any table. When the trigger causes the function to run, the word `NEW` represents the table being updated.

Our function calls another function, one of many date-time functions built into Postgres. As mentioned above, these functions vary.

STEP 3 — Trigger

To run that function, we must define a trigger. A trigger is a rule living on the Postgres server that says a function should be run upon certain events happening. In our case, the event we care about is when a record is being modified (`UPDATE` in SQL terminology).

Postgres allows a trigger to run before the row in the actual database is affected by an event operation, or after the event operation. In our case we want to run the trigger before the record is actually updated. Running our trigger after an update would cause an endless loop as our act of setting the current date-time on the `row_modified_` column would be another update which would necessitate our trigger running again, and again, and again forever.

Example Code

Here is some example code  showing the above three steps.

All three steps can be rolled into a single SQL operation inside a transaction. Note the `BEGIN` and `COMMIT` for the transaction boundaries.

BEGIN;

ALTER TABLE customer_
   ADD COLUMN row_modified_ TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT clock_timestamp();

ALTER TABLE invoice_
   ADD COLUMN row_modified_ TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT clock_timestamp();


ALTER TABLE customer_
   ADD COLUMN row_created_ TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT clock_timestamp();

ALTER TABLE invoice_
   ADD COLUMN row_created_ TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT clock_timestamp();


CREATE OR REPLACE FUNCTION update_row_modified_function_()
RETURNS TRIGGER 
AS 
$$
BEGIN
    -- ASSUMES the table has a column named exactly "row_modified_".
    -- Fetch date-time of actual current moment from clock, rather than start of statement or start of transaction.
    NEW.row_modified_ = clock_timestamp(); 
    RETURN NEW;
END;
$$ 
language 'plpgsql';

CREATE TRIGGER row_mod_on_customer_trigger_
BEFORE UPDATE
ON customer_ 
FOR EACH ROW 
EXECUTE PROCEDURE update_row_modified_function_();

CREATE TRIGGER row_mod_on_invoice_trigger_
BEFORE UPDATE
ON invoice_ 
FOR EACH ROW 
EXECUTE PROCEDURE update_row_modified_function_();

COMMIT;


4 comments:

  1. Choose from a range of Time Attendance Software options tailored to suit your business requirements. View our range of time and attendance Software today

    ReplyDelete
  2. Thank you, good example!

    I used your code and modified it for my specific needs: cover both INSERT (because i didn't wanted to have default values) and UPDATE in the same function and also add the current user:

    CREATE OR REPLACE FUNCTION update_row_modified_function_()
    RETURNS TRIGGER
    AS
    $$
    BEGIN
    IF TG_OP = 'INSERT' THEN
    NEW.row_created_ = CURRENT_TIMESTAMP(0);
    NEW.row_created_by_ = CURRENT_USER;
    RETURN NEW;
    ELSIF TG_OP = 'UPDATE' THEN
    NEW.row_modified_ = CURRENT_TIMESTAMP(0);
    NEW.row_modified_by_ = CURRENT_USER;
    RETURN NEW;
    END IF;
    END;
    $$
    language 'plpgsql';

    With a Trigger for both action:

    CREATE TRIGGER row_mod_on_customer_trigger_
    BEFORE INSERT OR UPDATE
    ON customer_
    FOR EACH ROW
    EXECUTE PROCEDURE update_row_modified_function_();

    ReplyDelete