SQLite as a document database

SQLite has had JSON support for a while.

However recently it added a killer feature: generated columns. (This was added in 3.31.0, released 2020-01-22.)

This makes it possible to insert JSON straight into SQLite and then have it extract data and index them, i.e. you can treat SQLite as a document database. This has been possible with PostgreSQL and obviously is what something like Elastic provides but having it available in an embedded database is very nice for lightweight stuff.

Let's get started:

$ sqlite3
SQLite version 3.31.1 2020-01-27 19:55:54
Connected to a transient in-memory database.
sqlite> CREATE TABLE t (
   body TEXT,
   d INT GENERATED ALWAYS AS (json_extract(body, '$.d')) VIRTUAL);
sqlite> insert into t values(json('{"d":"42"}'));
sqlite> select * from t WHERE d = 42;
{"d":"42"}|42

It's that simple, the d column is extracted from the provided JSON.

(Aside: The hard bit may be getting a new enough SQLite, at the time of writing Homebrew on macOS has it, else you likely need to use an unstable source like nixpkgs-unstable.)

There's some nice properties of this. Normally it's encouraged to minifiy and validate JSON when inserting (via the json() function) as because SQLite doesn't have a JSON type it will allow anything. However nothing enforces that, you could add a constraint but will probably forget... Having GENERATED ALWAYS using json_extract means invalid JSON will get a Error: malformed JSON at INSERT time.

This can be taken further:

sqlite> CREATE TABLE x (
  body TEXT,
  id TEXT GENERATED ALWAYS AS (json_extract(body, '$.id')) VIRTUAL NOT NULL);
sqlite> insert into x values('');
Error: malformed JSON
sqlite> insert into x values('{}');
Error: NOT NULL constraint failed: x.id

We can enforce items are present in the inserted JSON, here by adding NOT NULL, but we could also use constraints and other SQLite features!

You'll notice I've used VIRTUAL with the generated column in these examples. There's also the option of using STORED to essentially cache the values, although a downside is you can't add those columns via ALTER TABLE.

However you can always create an index on a column, even if it's defined a virtual one:

CREATE INDEX xid on x(id);

Then check that's going to work as expected:

EXPLAIN QUERY PLAN SELECT * FROM x WHERE id='foo';
QUERY PLAN
`--SEARCH TABLE x USING INDEX xid (id=?)

Combined with ALTER TABLE we can add a new column and index it:

ALTER TABLE x ADD COLUMN text TEXT
    GENERATED ALWAYS AS (json_extract(body, '$.text')) VIRTUAL;
INSERT INTO x VALUES(json('{"id":43, "text":"test"}'));
CREATE INDEX xtext ON x(text);

The benefit here is you can start off with a table which could be as simple as just a single JSON column, and add columns and indexes as you find useful data in that JSON. For example this can work really well for webhooks, insert all the data you are sent straight into a table, then pull out the useful stuff later. Have fun.