Deep Dive · TTL Implementation

TTL Without a Background Thread

How SNKV implements per-key expiry using a dual B-tree index, lazy expiry on read, and an explicit purge — with zero background threads and full ACID guarantees.

SNKV v0.3.0 · C · SQLite B-Tree TTL Embedded KV

Design Goals

Most embedded stores that add TTL reach for one of two mechanisms: a background thread that periodically scans for expired keys, or a scheduler that fires expiry callbacks. Both introduce complexity — thread lifecycle management, locking across expiry and reads, and non-deterministic deletion timing.

SNKV's TTL design avoids both. The constraints were:

ConstraintImplication
No background threadSingle-threaded embedded use; no daemon required
ACID for expiryExpiry deletes must be transactional — no partial state
O(expired) purgeScanning all keys to find expired ones is unacceptable
Zero overheadCFs that never use TTL must pay nothing at all
Per-CF isolationTTL on one CF must not affect another

The result is a dual B-tree index per CF, lazily created on first TTL use, with lazy expiry on every read and an explicit purge_expired() for bulk cleanup.


Dual-Index Architecture

For every column family <X> that uses TTL, two internal B-tree CFs are created:

  User CF "sessions"       (data)
  ┌──────────────────────────────────────────────────────────────┐
  │  key → value                                                 │
  └──────────────────────────────────────────────────────────────┘

  __snkv_ttl_k__sessions   (key index)
  ┌──────────────────────────────────────────────────────────────┐
  │  key → expire_ms (8 bytes, big-endian int64)                 │
  └──────────────────────────────────────────────────────────────┘

  __snkv_ttl_e__sessions   (expiry index)
  ┌──────────────────────────────────────────────────────────────┐
  │  [expire_ms BE64][key] → empty                               │
  │  sorted by expire_ms ascending (B-tree key order)            │
  └──────────────────────────────────────────────────────────────┘

Key Index CF — __snkv_ttl_k__<X>

Maps each key that has an active TTL to its absolute expiry time in milliseconds since Unix epoch. This enables O(log n) point lookup during lazy expiry: when get(key) is called, SNKV looks up key in the key index to find its expiry, compares against kvstore_now_ms(), and evicts if expired.

Why a separate key index?

Storing the expiry timestamp inside the value payload would require reading and decoding the value on every get, even for non-TTL keys. The key index keeps TTL metadata out of the hot data path.

Expiry Index CF — __snkv_ttl_e__<X>

The expiry index is the key to O(expired) purge. Each entry's key is an 8-byte big-endian expire_ms followed immediately by the original data key:

  Expiry index key layout:
  ┌──────────────────┬───────────────────────────────┐
  │  expire_ms BE64  │  original key (variable len)  │
  │  (8 bytes)       │                               │
  └──────────────────┴───────────────────────────────┘

  Example entries (sorted by B-tree key order = sorted by time):
  1735000000000  "session:alice"    ← expires soonest
  1735000000500  "session:bob"
  1735001000000  "token:xyz"        ← expires latest

Because the B-tree sorts entries lexicographically and the timestamp is the first 8 bytes in big-endian order, entries are physically sorted by expiry time. purge_expired() scans from the first entry forward, deleting everything with expire_ms <= now_ms, and stops at the first entry that has not yet expired — without scanning the rest.

Big-Endian Timestamp Encoding

The timestamp must be encoded in big-endian byte order for the B-tree's lexicographic sort to be numerically correct. Little-endian would reverse the byte significance and produce incorrect ordering.

src/kvstore.c — BE64 encode/decode
static void kvstoreEncodeBE64(unsigned char buf[8], int64_t v){
  buf[0] = (unsigned char)((v >> 56) & 0xFF);
  buf[1] = (unsigned char)((v >> 48) & 0xFF);
  buf[2] = (unsigned char)((v >> 40) & 0xFF);
  buf[3] = (unsigned char)((v >> 32) & 0xFF);
  buf[4] = (unsigned char)((v >> 24) & 0xFF);
  buf[5] = (unsigned char)((v >> 16) & 0xFF);
  buf[6] = (unsigned char)((v >>  8) & 0xFF);
  buf[7] = (unsigned char)( v        & 0xFF);
}

static int64_t kvstoreDecodeBE64(const unsigned char buf[8]){
  return ((int64_t)buf[0] << 56) | ((int64_t)buf[1] << 48) |
         ((int64_t)buf[2] << 40) | ((int64_t)buf[3] << 32) |
         ((int64_t)buf[4] << 24) | ((int64_t)buf[5] << 16) |
         ((int64_t)buf[6] <<  8) |  (int64_t)buf[7];
}

Lazy Expiry on Get

There is no background scanner. Expiry is checked on every get call for CFs that have active TTL keys. The check is short-circuited by nTtlActive — a counter of live TTL entries — so CFs without any TTL keys skip the check entirely with a single integer comparison.

src/kvstore.c — lazy expiry in get path
/* Fast path: skip TTL check if no keys have TTL */
if( pCF->hasTtl && pCF->nTtlActive > 0 && pCF->pTtlKeyCF ){

  /* O(log n) point lookup in key index CF */
  void *pTtlVal = NULL; int nTtlVal = 0;
  int rck = kvstoreRawBtreeGet(pKV, pCF->pTtlKeyCF->iTable,
                               pKey, nKey, &pTtlVal, &nTtlVal);
  if( rck == SQLITE_OK && nTtlVal == 8 ){
    int64_t expireMs = kvstoreDecodeBE64(pTtlVal);
    sqlite3_free(pTtlVal);

    if( kvstore_now_ms() >= expireMs ){
      /* Expired — upgrade to write tx, delete from all three tables */
      kvstoreFreeCursor(pCF->pReadCur);   /* invalidate cached cursor */
      pCF->pReadCur = NULL;
      sqlite3BtreeBeginTrans(pKV->pBt, 1, 0);
      pKV->inTrans = 2;

      kvstoreRawBtreeDelete(pKV, pCF->iTable, pKey, nKey);
      kvstoreRawBtreeDelete(pKV, pCF->pTtlKeyCF->iTable, pKey, nKey);
      /* delete from expiry index: key = [expire_ms BE64][original key] */
      kvstoreRawBtreeDelete(pKV, pCF->pTtlExpiryCF->iTable, pExpKey, 8+nKey);

      pCF->nTtlActive--;
      sqlite3BtreeCommit(pKV->pBt);
      return KVSTORE_NOTFOUND;
    }
  }
}
Transaction upgrade

A lazy expiry delete must upgrade the transaction from read to write. SNKV first commits the existing read transaction (releasing the WAL read slot), then begins a write transaction, performs the three deletes atomically, and commits. The cached read cursor is invalidated before the upgrade to avoid dangling cursor state.

The three deletes — data CF, key index, expiry index — happen in a single write transaction. Either all three succeed and commit, or none do. An expired key can never be in a partially-cleaned state.


Put with TTL

kvstore_cf_put_ttl(pCF, pKey, nKey, pVal, nVal, expire_ms) writes to three B-trees atomically within a single write transaction:

  kvstore_cf_put_ttl("session:alice", "tok123", expire_ms=1735000000000)

  Step 1 — write data CF:
    sessions["session:alice"] = "tok123"

  Step 2 — write key index:
    __snkv_ttl_k__sessions["session:alice"] = BE64(1735000000000)

  Step 3 — write expiry index:
    __snkv_ttl_e__sessions[BE64(1735000000000) + "session:alice"] = ""

  Step 4 — commit (all three or none)

Overwrite Correctness

When a plain put() (without TTL) overwrites a key that previously had a TTL, the old TTL entries must be cleaned up. Otherwise, a stale expiry index entry would later cause purge_expired() to delete a key that has already been replaced with a non-expiring value.

src/kvstore.c — TTL cleanup on plain put
/* After a successful plain put — clean up any stale TTL entries */
if( rc == SQLITE_OK && pCF->hasTtl &&
    pCF->nTtlActive > 0 && pCF->pTtlKeyCF ){

  void *pOldTtl = NULL; int nOldTtl = 0;
  int rck = kvstoreRawBtreeGet(pKV, pCF->pTtlKeyCF->iTable,
                               pKey, nKey, &pOldTtl, &nOldTtl);
  if( rck == SQLITE_OK && nOldTtl == 8 ){
    /* Build the expiry index key: [old_expire_ms][original key] */
    unsigned char *pExpKey = sqlite3Malloc(8 + nKey);
    memcpy(pExpKey, pOldTtl, 8);
    memcpy(pExpKey + 8, pKey, nKey);
    kvstoreRawBtreeDelete(pKV, pCF->pTtlExpiryCF->iTable, pExpKey, 8+nKey);
    sqlite3_free(pExpKey);
    kvstoreRawBtreeDelete(pKV, pCF->pTtlKeyCF->iTable, pKey, nKey);
    pCF->nTtlActive--;
  }
  sqlite3_free(pOldTtl);
}

Explicit Purge

kvstore_cf_purge_expired(pCF, &nDeleted) scans the expiry index from the beginning (smallest expire_ms) and deletes all entries where expire_ms <= now_ms. It stops at the first entry that has not expired. Because the expiry index is sorted by time, this is O(expired keys) — not O(all TTL keys).

  Expiry index (sorted ascending by expire_ms):

  1735000000000  "session:alice"   ← expired, delete
  1735000000500  "session:bob"     ← expired, delete
  1735000001000  "session:carol"   ← expired, delete
  1739000000000  "token:xyz"       ← NOT expired, stop here

  purge scans 3 entries, deletes 3, never touches "token:xyz"

Batched Deletion

The purge loop processes up to KVSTORE_PURGE_BATCH = 256 expired keys per transaction. After each batch it commits, releases the mutex briefly, then continues. This prevents a single large purge from holding the write lock for an unbounded time.

src/kvstore.c — purge batch loop
#define KVSTORE_PURGE_BATCH 256

for(;;){
  /* Stack-allocated batch arrays — no heap per batch */
  void *apExpKeys[KVSTORE_PURGE_BATCH];
  int   anExpKeys[KVSTORE_PURGE_BATCH];
  int   nExpired = 0;

  sqlite3_mutex_enter(pCF->pMutex);
  sqlite3_mutex_enter(pKV->pMutex);

  /* Scan expiry index until batch full or first unexpired entry */
  /* ... cursor scan ... */

  if( nExpired == 0 ){
    /* Nothing left to purge */
    sqlite3_mutex_leave(pKV->pMutex);
    sqlite3_mutex_leave(pCF->pMutex);
    break;
  }

  /* Delete batch from all three tables, commit, release mutex */
  /* ... delete loop ... */

  sqlite3BtreeCommit(pKV->pBt);
  sqlite3_mutex_leave(pKV->pMutex);
  sqlite3_mutex_leave(pCF->pMutex);
  /* other threads can run here before next batch */
}

The batch arrays live on the stack and hold only pointers and lengths — approximately 3 KB per iteration (256 × 8-byte pointer + 256 × 4-byte int). The actual key bytes are heap-allocated during the scan and freed immediately after each batch commit.


Zero Overhead for Non-TTL CFs

The TTL index CFs are created lazily — only on the first put_ttl call for a given CF. Until that point, the CF struct fields are:

src/kvstore.c — KVColumnFamily TTL fields
struct KVColumnFamily {
  /* ... other fields ... */
  int            hasTtl;        /* 0 until first put_ttl */
  int            nTtlActive;    /* 0 = no TTL keys alive */
  KVColumnFamily *pTtlKeyCF;    /* NULL until first put_ttl */
  KVColumnFamily *pTtlExpiryCF; /* NULL until first put_ttl */
};

Every get on a CF with hasTtl == 0 or nTtlActive == 0 skips the TTL check with a single branch. There is no per-key overhead for CFs that never call put_ttl.

0
background threads
0
overhead for non-TTL CFs
O(exp)
purge complexity

Per-CF Isolation

Each column family gets its own pair of TTL index CFs, named by appending the CF name to the prefix:

  CF "sessions"  →  __snkv_ttl_k__sessions
                    __snkv_ttl_e__sessions

  CF "tokens"    →  __snkv_ttl_k__tokens
                    __snkv_ttl_e__tokens

  CF "cache"     →  __snkv_ttl_k__cache
                    __snkv_ttl_e__cache

Purging expired keys in sessions has no effect on tokens. The nTtlActive counter is per-CF. TTL state is completely isolated between column families — a CF with no active TTL keys skips the check even if another CF in the same store has thousands of expiring keys.

The __snkv_ttl_ prefix is reserved. The public API rejects any user attempt to create a CF whose name starts with __, preventing accidental collision with internal index CFs.


Iterator TTL Handling

The prefix iterator and full iterator also perform lazy TTL checks on each key they visit. Expired keys are evicted during iteration — they are deleted from all three tables and the iterator advances past them transparently. The caller never sees an expired key from an iterator.

Because the iterator holds its own B-tree cursor (separate from pReadCur), the upgrade from read to write transaction during lazy expiry requires the iterator cursor to be closed before the write, and re-opened and re-positioned after commit. SNKV saves the last seen key before the write and restores the cursor position after.


API Reference

C API

FunctionDescription
kvstore_put_ttl(pKV, key, nKey, val, nVal, expire_ms)Put with absolute expiry time (ms since epoch) on default CF
kvstore_cf_put_ttl(pCF, key, nKey, val, nVal, expire_ms)Same, on named CF
kvstore_get_ttl(pKV, key, nKey, ...)Get with TTL, lazy expiry on read
kvstore_cf_get_ttl(pCF, key, nKey, ...)Same, on named CF
kvstore_ttl_remaining(pKV, key, nKey, &remaining)Seconds remaining for key; KVSTORE_NO_TTL if no TTL set
kvstore_cf_ttl_remaining(pCF, key, nKey, &remaining)Same, on named CF
kvstore_purge_expired(pKV, &nDeleted)Bulk delete all expired keys on default CF
kvstore_cf_purge_expired(pCF, &nDeleted)Same, on named CF
kvstore_now_ms()Current time in ms since Unix epoch

Python API

Python — TTL usage
from snkv import KVStore, NotFoundError

with KVStore("mydb.db") as db:
    # Put with TTL — seconds (float precision)
    db.put(b"session", b"tok123", ttl=60)

    # Dict-style shorthand: db[key, ttl] = value
    db[b"token", 30] = b"bearer-xyz"

    # Get — expired keys raise NotFoundError
    val = db.get(b"session")

    # Check remaining lifetime
    try:
        remaining = db.ttl(b"session")   # float seconds
    except NotFoundError:
        remaining = None

    # Purge all expired keys (returns count removed)
    n = db.purge_expired()

    # Column families support TTL identically
    with db.create_column_family("cache") as cf:
        cf[b"item", 10] = b"data"
        n = cf.purge_expired()

Design Tradeoffs

PropertySNKV (lazy + explicit purge)Background thread (Redis-style)
Expiry precisionExact on read; best-effort on disk until purgeProbabilistic; background thread fires on interval
Read overheadOne extra B-tree lookup per get (skipped if no TTL)None on read path
Disk spaceExpired keys remain on disk until purge is calledReclaimed asynchronously by background thread
Thread modelZero additional threadsOne background thread per store
ACID for expiryYes — all three deletes in one transactionDepends on implementation
Purge complexityO(expired keys)O(sampled keys) or O(all TTL keys)

Expired keys are invisible to reads immediately. They are invisible to disk after purge_expired(). The distinction matters only for storage budgets, not correctness.

For embedded use cases — mobile apps, IoT, CLI tools, edge servers — the zero-thread model is the right default. A background thread that fires every N seconds requires the process to stay alive long enough for it to run, and introduces thread lifecycle complexity that single-threaded embedders simply do not want.

Applications that need aggressive space reclamation can call purge_expired() on a schedule that fits their workload — after each request batch, on a timer, or at startup — and optionally follow with vacuum() to release the freed pages back to the OS.