Store SQLite in Cloudflare Durable Objects
This is my journey of creating a proof-of-concept of persisting an SQLite database in a Cloudflare Durable Object. It is not meant as a tutorial, but rather as something I myself can get back to it, to revisit what and why I did certain things.
tl;dr: A custom SQLite virtual file system and some WASM/WASI compilation magic allow to run SQLite on a Cloudflare Worker and persist it into a Durable Object. POC source can be found at github.com/rkusa/do-sqlite.
The Plan
Everything started with me wanting a way of saving an SQLite database close to my Cloudflare workers (just Workers in the following).
Where to store SQLite on Cloudflare?
While Workers can easily interact with anything that speaks HTTP, I'd like to be able to use something serverless, and preferably something that also lives on the edge, or at least close to it. Cloudflare offers two persistent storage products for its workers right now. There are Workers KV (KV in the following) and Durable Objects (DO in the following). Soon there is also R2 Storage, a S3 compatible object storage, which I am not going to cover here as I don't have access to it yet.
Both Workers KV and Durable objects are key-value stores. The main difference between both is their consistency guarantees. Workers KV is eventually-consistent while Durable Objects is strongly-consistent. All other differences are mostly a result of those different consistencies.
So which one to choose?
First, let's look at their limitations. KV can store values of up to 25 MiB with no total limit for paid accounts (KV Limits). DO has a total limit of 10 GB per account (can be raised by contacting Cloudflare) and a value limit of 128 KiB (DO Limits).
Neither 25 MiB nor 128 KiB is enough as an upper limit for an SQL database. Fortunately, this is not a deal-breaker, as we can split up the SQLite database file into blocks and persist one block as one value.
The SQLite file conveniently consists of one or multiple pages of the same size. The size of a page is a power of two between 512 and 65536 inclusive (see SQLite File Format). The default page size is 4096 bytes. One single read/write call of SQLite never crosses page boundaries. This makes splitting the file into blocks particularly easy.
The maximum page size fits both KV and DO, so neither of them is ruled out yet. Let's think about their consistencies next. If the database file is split across multiple key-value pairs, we have to make sure that all pages (all key-value pairs) are always up to date when accessing the database. This is not guaranteed to be the case with an eventually-consistent storage, so KV is not an option here.
With DO being strongly-consistent, we have our SQLite store. It of course comes with drawbacks compared to KV that are necessary to achieve the higher consistency guarantees. In a nutshell, one DO only exists once. It is unique. Calls to it are routed to its point-of-presence that owns it. This leads to higher latencies compared to KV. The latency depends on where a request hits the Cloudflare network and where the DO is located.
The DO introduction blog post also confirmed that DO is a reasonable choice for this use-case:
That said, every big distributed database – whether it be relational, document, graph, etc. – is, at some low level, composed of "chunks" or "shards" that store one piece of the overall data. The job of a distributed database is to coordinate between chunks.
We see a future of edge databases that store each "chunk" as a Durable Object. By doing so, it will be possible to build databases that operate entirely at the edge, fully distributed with no regions or home location. These databases need not be built by us; anyone can potentially build them on top of Durable Objects. Durable Objects are only the first step in the edge storage journey.
How to store the SQLite database file in a DO?
This is fortunately not as difficult as it might sound. SQLite already offers pluggable file systems - aka. virtual file systems (VFS). So I just have to write my own virtual file system, which stores each page of the database file into a separate DO key-value, and plug it into SQLite.
How to run SQLite on Workers?
SQLite is written in C, while workers is based on V8 isolates, so it mainly runs JavaScript. Fortunately, it also supports running WASM through initialising and calling WASM modules via JavaScript. Emscripten can be used to build WASM from C, but I'd rather use it through Rust (using rusqlite
), so this is what I focus on right away. Workers can also be written entirely in Rust using worker-rs
.
The initial plan
With that all out of the way, the initial plan is to write a worker using worker-rs
. Use rusqlite
to create and use the SQLite database. Save each page of the database file into a separate DO key-value and persisted into the DO using a custom virtual file system.
Compile SQLite to WASM for target wasm-unknown-unknown
With worker-rs
this should just be a matter of using rusqlite
and building the project via wrangler dev
or wrangler publish
. With a plain Rust project, it would be a matter of building via cargo build --target wasm32-unknown-unknown
. I am using the bundled
feature for the build:
rusqlite = { version = "0.26", features = ["bundled"] }
Does it work? Nope.
It doesn't like the usage of <stdio.h>
.
Fortunately, there is an open PR to improve the WASM compatibility: rusqlite/rusqlite#1010.
Let's try using the fork instead:
rusqlite = { git = "https://github.com/trevyn/rusqlite.git", branch = "wasm32-unknown-unknown", features = ["bundled"] }
Which brings us to:
I am on a MacBook and /usr/local/opt/llvm/bin/clang
indeed doesn't exist on my machine. My solution was to first install llvm
via Homebrew:
And then instruct Cargo via environment variables where my clang
is located:
CC=/opt/homebrew/opt/llvm/bin/clang \
Still nope:
Let's also tell it where llvm-ar
is located:
CC=/opt/homebrew/opt/llvm/bin/clang \
Aaannnnd ... it builds!
Build the Virtual File System
Building a VFS for SQLite is a matter of subclassing three objects: sqlite3_vfs
, sqlite3_io_methods
, and sqlite3_file
. A demo implementation that uses POSIX functions is provided, too.
With that, I had enough information to build sqlite-vfs
, a Rust library that wraps all the necessary interactions with the C SQLite library and exposes a much simpler Rust trait. I'll not go into the details of this library here.
Implementing this trait is enough for a simple VFS. I only implemented as much as was necessary for my use-case (e.g. I skipped memory-mapped files). The trait is:
The implementation of the trait for saving each page as a separate object works as follows.
The PagesVfs
is a struct that is generic over the page size.
;
It uses a Pages
struct as the File
associate type, which in return is a map of page index to Page
, the count
of pages in total, and the current offset
for reading and writing. Each Page
is simply its data
and a dirty
flag to keep track of whether it changed. The structs use the rather new Rust feature of const generics
.
The implementation of the trait is rather straight forward. I'll spare you all the boring details. Just a summary of how reading and writing works.
For reading, I wrote a method to get the current page based on the current offset into the database file. This method is used in the std::io::Read
implementation. It reduces everything down to a get_page
method that takes an index and returns the page data for it. With a fixed block/page size, it is quite easy to get the current page (self.offset / PAGE_SIZE
), as well as the offset into it (self.offset % PAGE_SIZE
).
For writing (the std::io::Write
implementation), I only update the in memory state of the page and mark it as changed. Actually persisting the changes happens as part of the flush
implementation. It iterates through all loaded pages and persists them by calling put_page
.
The whole code can be found here: github.com/rkusa/wasm-sqlite/blob/main/wasm/src/vfs.rs. By reducing all down to the two methods get_page
and put_page
(to which I get back later in the post), the VFS ended up being a generic VFS for when you want to persist SQLite pages as separate files/objects. Using it anywhere else than a DO would just be a matter of providing a different implementation to those two methods.
Use the VFS in a DO
Using the VFS in a DO is – in theory – as simple as interacting with the DO's Transactional storage API inside of the get_page
and put_page
methods in the previous code block.
This, however, doesn't work. The VFS is synchronous while calls to the storage are asynchronous. For example, the get
method has the following signature:
key: string, options?: Object : Promise<T | undefined>
The worker runs, same as the browser, as a single thread using cooperative multitasking. It is thus impossible to await an asynchronous method in the synchronous context of the VFS. One might think one could just have an endless loop polling the promise (or future in the Rust context) until it is completed. But this is impossible in a single-threaded environment, since the endless loop would block forever and never switch to actually running any other async task. It would require at least two threads to do that (and it would be a bad idea even then).
Fortunately, the wasm-opt
tool has an option to asyncify methods in a WASM module. It, therefore, rewrites the WASM and allows to pause and resume the execution. This is exactly what I need to call async functions from within the sync context of the VFS.
It has one drawback though. I need control over how the WASM module is initialised so that I can use the Asyncify
wrapper and provide the corresponding async imports. This is something I don't easily have control over using worker-rs
, since worker-rs
takes care of generating all the necessary JavaScript glue code for me.
So one step back. worker-rs
cannot be used. At least for now – I am sure there is a way to make it work. But the easier way forward is to write the worker and DO in JavaScript and write a WASM module that only takes care of querying the SQLite database.
Target: wasm32-unknown-unknown
vs wasm32-wasi
Before diving into the asyncification, I have to take full control over the WASM initialisation and thus its imports and exports.
While I could theoretically still build for wasm32-unknown-unknown
, this target has a major drawback. A lot of libraries automatically use wasm-bindgen
as soon as you build for it.
I want control over the module initialisation, which means that I am writing the glue code for JavaScript myself. This doesn't work if my dependencies assume wasm-bindgen
and generate an endless amount of __wbindgen
imports and exports. Let's inspect the build for wasm32-unknown-unknown
.
Excerpt of the output:
Imports:
Functions:
"__wbindgen_placeholder__"."__wbindgen_describe": [I32] -> []
"__wbindgen_placeholder__"."__wbg_getRandomValues_98117e9a7e993920": [I32, I32] -> []
"__wbindgen_placeholder__"."__wbg_randomFillSync_64cc7d048f228ca8": [I32, I32, I32] -> []
"__wbindgen_placeholder__"."__wbg_process_2f24d6544ea7b200": [I32] -> [I32]
"__wbindgen_placeholder__"."__wbindgen_is_object": [I32] -> [I32]
"__wbindgen_placeholder__"."__wbg_versions_6164651e75405d4a": [I32] -> [I32]
...
Memories:
Tables:
Globals:
Exports:
Functions:
"__wbindgen_describe___wbg_crypto_98fc271021c7d2ad": [] -> []
"__wbindgen_describe___wbg_getRandomValues_98117e9a7e993920": [] -> []
"__wbindgen_describe___wbg_modulerequire_3440a4bcf44437db": [] -> []
"__wbindgen_describe___wbg_randomFillSync_64cc7d048f228ca8": [] -> []
"__wbindgen_describe___wbg_msCrypto_a2cdb043d2bfe57f": [] -> []
...
I had to trim down the output a lot. There are 716 imports/exports in total that have the __wbindgen
prefix.
So wasm32-unknown-unknown
simply isn't going to work. I need kind of a neutral WASM target. Let's see what WASM targets we can choose from:
|
The emscripten
target also assumes JavaScript glue code from my experience, so wasm32-wasi
is the only one left - so I am going with that.
Build SQLite for WASI for wasm32-wasi
If you are wondering, WASI is a modular system interface for WebAssembly. Head over to wasi.dev for a more detailed introduction.
There is some work required to get rusqlite
to build for wasm32-wasi
. Let's walk through all errors and their solutions.
Similar to trying to build for wasm32-unknown-unknown
, it complains about not finding stdio.h
.
cargo:warning=sqlite3/sqlite3.c:13699:10:
cargo:warning=#include cargo:warning=
cargo:warning=1
The previously mentioned PR for wasm32-unknown-unknown
compatibility solved this by including libc .c
files from OpenBSD. My go to solution is different though. I prefer to build using the wasi-sdk
(a WASI-enabled WebAssembly C/C++ toolchain).
At first we need to build the wasi-sdk
. Here is a shell script I am using for that in various projects (you might have to install cmake
and/or ninja
: brew install cmake ninja
):
#!/usr/bin/env sh
NINJA_FLAGS=-v
This clones and builds the wasi-sdk
into ./wasm/wasi-sdk
. The build toolchain can then be found at ./wasm/wasi-sdk/dist/wasi-sdk-14.0/bin
, and the WASI sysroot at ./wasm/wasi-sdk/dist/wasi-sdk-14.0/share/wasi-sysroot
. The build will take a while (~15min on my M1). Alternatively, if you are not on a Mac ARM, you can use a prebuild sysroot and binaries instead.
I found that the easiest way to get cargo to use all the build tools from the wasi-sdk
is to prepend the PATH
env var with the bin
directory of the wasi-sdk
build. While I usually prefer calling cargo
directly and set up everything else in .config/cargo.toml
, I couldn't find a way to set all build tools in .config/cargo.toml
. So a Makefile it is. After putting everything together, the Makefile (at ./wasm/Makefile
) looks like:
cargo_build_flags += --release
endif
: :
With the Makefile above, make build
creates a release build, and make build DEBUG=1
builds a debug build.
Let's try building again:
The previous error is fixed, but a new one is thrown. It now complains about not finding pthread.h
.
cargo:warning=sqlite3/sqlite3.c:27215:10:
cargo:warning=#include cargo:warning=
cargo:warning=1
The wasi-sdk
doesn't have support for pthread.h
yet (see WebAssembly/wasi-libc#209). A workaround for most libraries is to look at build flags that remove everything related to pthreads from the library you are trying to build.
In case of SQLite, I found various flags to achieve that. One being SQLITE_THREADSAFE=0
, which disables code used to operate sqlite safely in multithreaded environments. I'd assume our WASM to be single-threaded, so I think I'd be safe to disable it. However, I didn't dig deeper into this, because adding this build flag brings us to the next set of errors:
cargo:warning=#error
This looks a lot like OS-specific stuff that is not available in WASI. While looking into how to get rid of this code, I found the SQLITE_OS_OTHER
flag. When this flag is set, built-in OS interfaces (for Unix, Windows, and OS/2) are omitted. Exactly what I was looking for.
Let's try building again.
Aaannndd... it builds!
I also found out that with SQLITE_OS_OTHER
being set, it doesn't seem to matter anymore whether SQLITE_THREADSAFE
is set to 1
or 0
- both build.
There is a PR for rusqlite
now, to fix its WASI compatibility (rusqlite/rusqlite#1116
; someone beat me to it).
While it builds now, does it also run? Nope. Trying to run it fails with:
WebAssembly.instantiate(): Import #1 module=\"env\" function=\"sqlite3_os_init\" error: function import requires a callable
But this was to be expected, as the docs clearly state:
Applications must also supply implementations for the
sqlite3_os_init()
andsqlite3_os_end()
interfaces. The usual practice is for the suppliedsqlite3_os_init()
to invokesqlite3_vfs_register()
. SQLite will automatically invokesqlite3_os_init()
when it initializes.
A minimal sqlite3_os_init
would simply register a custom virtual file system:
extern "C"
As a side-note, if sqlite3_os_init
isn't registering a VFS, or if it isn't returning 0
(as in no error), you might get a rather unhelpful error like:
RuntimeError: unreachable
at wasm://wasm/010bf20e:wasm-function[1716]:0x2a1b64
at wasm://wasm/010bf20e:wasm-function[1764]:0x2a84cb
at wasm://wasm/010bf20e:wasm-function[1782]:0x2aa2e3
at wasm://wasm/010bf20e:wasm-function[1829]:0x2b3a23
With being able to build SQLite for wasm32-wasi
, we can continue looking into how to initialise and use it.
WASI imports
When inspecting the build WASM module, you find various required imports:
Imports:
Functions:
"wasi_snapshot_preview1"."random_get": [I32, I32] -> [I32]
"wasi_snapshot_preview1"."clock_time_get": [I32, I64, I32] -> [I32]
"wasi_snapshot_preview1"."fd_write": [I32, I32, I32, I32] -> [I32]
"wasi_snapshot_preview1"."poll_oneoff": [I32, I32, I32, I32] -> [I32]
"wasi_snapshot_preview1"."environ_get": [I32, I32] -> [I32]
"wasi_snapshot_preview1"."environ_sizes_get": [I32, I32] -> [I32]
"wasi_snapshot_preview1"."proc_exit": [I32] -> []
WASI offers a huge amount of methods. Fortunately, it is modular and our module will only require the ones that are actually used. We have to provide the ones it uses or the WASM module isn't going to run otherwise.
While there is a WASI implementation for Workers: cloudflare/workers-wasi
, I prefer to implement each import manually - especially when there are so few and especially while I am still experimenting. This helps me to keep the full picture of what's going on.
Providing my own implementation basically looks like:
The imports are specified when instantiating the WASM module, and are added under the wasi_snapshot_preview1
namespace. As hinted in the snippet above (for clock_time_get
), I usually throw a not implemented error for all imports at first, and only implement them when I see them being used.
The fd_write
is a special one, because I usually only implement it for stdout
and stderr
and not any other file descriptor (so not for actual files):
fd: number,
iovsOffset: number,
iovsLength: number,
nwrittenOffset: number
With that out of the way, it is time to get back to asyncifying the WASM module.
Asyncify WASM
As a recap, the asyncification is necessary to call the async storage APIs from the sync VFS.
The asyncification is achieved by unwinding and rewinding the local stack. With wasm-opt
, this is done on a WASM level as post-build transformation. This makes it a language-agnostic approach. Once a WASM module is asyncified it needs to be used through the JavaScript wrapper Asyncify
. A good introduction to asyncify can be found here: Using asynchronous web APIs from WebAssembly.
This is how it is used. First I had to add WASM imports, which correspond to external functions in Rust:
extern "C"
I am then using those imports in my VFS:
With that integrated, I can build the WASM file as done previously and once build, post-process it via wasm-opt
:
The flags --asyncify --pass-arg [email protected]_page,env.get_page
tell the tool that both the put_page
and get_page
imports are asynchronous.
I then have to initialise the WASM module using the Asyncify
wrapper and provide async implementations of get_page
and put_page
:
;
That's it. The WASM module is now able to interact with the async storage APIs.
Increase Asyncify Stack Size
With all pieces together now, does it run? Yes and no. Depending on the query I've tested, I ran into yet another very helpful error:
RuntimeError: unreachable
at wasm://wasm/010bf772:wasm-function[1833]:0x2b3c6e
Some println!()
debugging later, I've found that the error happens once the WASM execution is resumed. When the WASM execution is paused, all locals at the point of pausing are saved. Before execution is resumed, all those locals are restored. Asyncify
allocates memory to persist those locals between pausing and resuming. This memory has a size of 1024
bytes and is apparently simply not enough for this use-case.
The stack size is unfortunately not configurable. According to Asyncify
's code, it assumes that the first 1024 bytes of the WASM memory are unused and uses that. I'd rather properly allocate memory for it. Since it is writing the stack start and end locations at an offset of 16, I simply overwrite those values with the location of a bigger and properly allocated memory location for now:
;
;
;
new exports.memory.buffer, DATA_ADDR, 2;
That's it. With that I had SQLite running in a Cloudflare Durable Object.
I am not sharing a link to my demo, as I haven't estimated the to be expected costs yet. But if you are curious, you can give it a try yourself using github.com/rkusa/do-sqlite.
Conclusion
Let me emphasise again: this is just an experiment. It is not in a state where I'd use it for an actual app yet. Here are some open points / concerns / future work:
- I am setting SQLite's journal mode to
memory
right now, which could lead to data loss. I'd have to extend the VFS to be capable of also persisting the journal file into the DO. - I am not happy with the the overhead that comes with pausing/resuming via
Asyncify
. - I have a bad feeling about
Asyncify
's stack size. I could imagine that I'd have to increase it further the more the database grows / or the larger the query result is. - The 128MB memory size of Workers might be a limiting factor when the database grows to a certain size. A possible improvement here would be to use an LRU cache for the pages and basically dispose pages that have been read in the past, but were not used recently.
- I need to make some more research and testing to be sure that the DO isn't writing to the storage concurrently.
- The current solution isn't capable of running DB transactions across multiple calls to the DO (and might never be actually).
- I'd need to make an estimation of the expected costs of running SQLite on a DO to feel safe actually deploying it for an app.
- A replication between multiple DO's living on different Cloudflare locations would be neat (maybe based on litestream.io).
The code can be found here:
- github.com/rkusa/sqlite-vfs: Build SQLite virtual file systems (VFS) by implementing a simple Rust trait.
- github.com/rkusa/wasm-sqlite: SQLite compiled to WASM with pluggable page storage.
- github.com/rkusa/do-sqlite: POC that persists SQLite in a Cloudflare Durable Object.
I'll continue experimenting.
Comments: Hacker News | dev.to