Tuesday, July 12, 2022

How to really fix django sqlite3 database is locked errors

So, you run a web site with < 1 page view per second. The site is basically a CRUD app and the page is typically generated within a fraction of a second.

You decide that using sqlite as database backend is fine because you don't need to scale.

Then, users making edits complain about random server errors. You find that they are caused by

`sqlite3.OperationalError: database is locked`

Fine. Google the error. There are some stack overflow discussions, but this seems to be the authoritative diagnosis:


https://docs.djangoproject.com/en/4.0/ref/databases/
https://docs.djangoproject.com/en/4.0/ref/databases/

So you try to set the timeout value to a large number. It doesn't work. "Database locked" errors persist. Now you're intrigued.


You run down the rabbit hole of integer overflows (apparently setting timeout value larger than 2147483.647 is a bad idea), second vs millisecond units, in-memory test databases having a different mechanism, etc. All are false leads.

At this point, the average software engineer might be inclined to switch to another database. After all, sqlite is bad at concurrency, right? But you know what others do not - you know what shit-tastic webscale load looks like and your site ain't it. Surely sqlite can handle < 1 request per second!

So you persist. Try to reproduce the problem.

@transaction.atomic
def view1(req):
    sleep(3) # <- insert some artificial delay
    # do_db_writing_stuff...

@transaction.atomic
def view2(req):
    # do_other_db_writing_stuff

Yay, it reproduces! (it seems)

You realize it's easily reproducible (no huge concurrency required). Just run view1 and then view2. And weirdly, if there was a timeout you'd expect view2 to get the database locked error, but surprisingly view1 gets it. Also, this 3 second sleep is lower than the database timeout anyway. This matches the observation that in production loads nothing is stuck for longer than the default database timeout, and it still fails.

So... fake error message? Maybe it's actually a deadlock?

Google "sqlite3 deadlock".

https://stackoverflow.com/questions/55831645/how-does-sqlite-prevent-deadlocks-with-deferred-transactions

Now we're talking!

This stackoverflow question is a true gem -- it laid out the problem clearly, and then basically answered itself. The sqlite default behavior is designed to deadlock, and when it detects one, it just throws an error ("database is locked"). It's worth quoting the original question that lays out the deadlock case:

This sounds like a multiple readers/single writer lock with arbitrary reader-to-writer promotion mechanism, which is known to be a deadlock hazard:

A starts transaction

B starts transaction

A acquires SHARED lock and reads something

B acquires SHARED lock and reads something

A acquires RESERVED lock and prepares to write something. It can't write as long as there are other SHARED locks so it blocks.

B wishes to write so tries to take RESERVED lock. There is already another RESERVED lock so it blocks until it is released, still holding the SHARED lock.

Deadlock.

Looks familiar!

Now you go into site-packages/django/db/backends/sqlite3/ and dig into the code that deals with transactions. Set isolation_level to EXCLUSIVE? Nope, we're supposed to use autocommit anyway. So eventually, we stumble upon this function:

    def _start_transaction_under_autocommit(self):
        """
        Start a transaction explicitly in autocommit mode.

        Staying in autocommit mode works around a bug of sqlite3 that breaks
        savepoints when autocommit is disabled.
        """
        self.cursor().execute("BEGIN")

What if we change it to:

        self.cursor().execute("BEGIN EXCLUSIVE TRANSACTION")

... it works! (no more locking errors if the artificial sleep time is less than 5 seconds)

OK, so why doesn't the Django docs tell me anything about that?

After some digging, you realize that there's a django issue, https://code.djangoproject.com/ticket/29280 titled 

Fix SQLite "database is locked" problems using "BEGIN IMMEDIATE"

that proposes the exact same patch.

Marked invalid of course.

There's good reason to not have this as default for perf reasons ("This means all atomic blocks will be serialized. Combined with ATOMIC_REQUESTS = True, HTTP requests will be serialized. I don't think that's an acceptable behavior.") However, at least tell us in the documentation and let us set an option....................

To recap:

https://docs.djangoproject.com/en/4.0/ref/databases/

Go figure.

No comments:

Post a Comment