Bystroushaak's blog / English section / Programming / objWiki / Transaction support for simple in-memory database

Transaction support for simple in-memory database

Notes from the implementation of the transaction support to the objWiki.

At the moment, objWiki uses a simple tree structure, where NodeTree stores a flat list of Nodes. Node can each hold a list of Inputs, and anot list of (sub)Nodes.

or more conveniently:

Purpose of this page is to collect all kinds of transaction mechanisms I can think of, that can be implemented on this kind of database.

By Transaction is understood database transaction mechanism, with two-phase commit protocol and rollbacks.

Example: When two clients work on the same node, and then they both call .commit(), second transaction fails, and he second client has to repeat it. First transaction finishes successfully, as it is oldest. After that, first client's changes are now part of the database and second client has to repeat the transaction.

Approaches

Copy on write

Some kind of COW support. Writing creates copy, which is then diffed.

Pros

Cons

Journal, mark & transaction error

Work on the original tree and mark every node that was changed. If you find an already changed element, rollback everything from journal and cancel the transaction. Or depending on the strategy, cancel the other transaction.

Journal can be a list of closures with inverse operations. Should be simple.

Pros

Cons

Deep copy, then diff

Just deep copy the whole tree, work on it, then diff it and merge it back.

Maybe the deep copy could be created from partially loaded nodes, which are copied on the fly only when the transaction actually touches them.

Pros

Cons

Problems

True Copy on write is hard to implement in python, because you can't really work with garbage collector and weak references in good enough granularity. It would probably be possible, but quick research proved that I don't see clear path toward this goal.

I've tried to implement Journaling approach, that is to define rollback operation for all actions that are possible with the NodeTree and Nodes and Input. It worked in principle, but it was way too hard to define inverse operation for each action, and so the probability of bug is relatively high. And also because you have to somehow allow storing transactions in the tree and there may be multiple transactions in place at one moment, whole thing complicates a lot and you end up with the partial copy of the tree anyway.

In the end, I've decided to focus on the Deep copy approach. I create a copy of the tree, which automatically notices the changes that were made. Theese changes are stored in the transaction. When client calls .commit() on the transaction, transaction manager checks whether the transaction conflicts with others and if not, copy is merged back into the original tree, with as little changes as possible.

Implementation

Transaction

Transaction is just simple container for the UUID's of changed stuff:

class TransactionConflict(Exception):
    pass


class Transaction:
    def __init__(self, transaction_manager, abort_immediately=False):
        self.creation_ts = time.time()

        self.transaction_manager = transaction_manager
        self.transactions_to_consider = []

        self.touched_uuids_read = set()
        self.touched_uuids_write = set()
        self.changed_objects = []

        self.invalid = False
        self.finished = False
        self.abort_immediately = abort_immediately

        self.tree = transaction_manager.tree._clone(self)

    def commit(self):
        self.transaction_manager.validate_and_commit(self)

    def invalidate(self):
        self.invalid = True
        self.finished = True
        self.abort_immediately = True

    def add_finished_transaction(self, transaction):
        self.transactions_to_consider.append(transaction)

    def collides_with(self, other):  # single way non-transitional?
        if not self.touched_uuids_read.isdisjoint(other.touched_uuids_write):
            return True

        if not self.touched_uuids_write.isdisjoint(other.touched_uuids_write):
            return True

        return False

    def add_r(self, uuid):
        if self.abort_immediately and self.invalid:
            raise TransactionConflict()

        self.touched_uuids_read.add(uuid)

    def add_w(self, uuid):
        if self.abort_immediately and self.invalid:
            raise TransactionConflict()

        self.touched_uuids_write.add(uuid)

    def add_changed(self, obj):
        self.changed_objects.append(obj)

It is used in such way, that all datastructures that wish to offer transaction support add their UUID to the transaction by calling .add_r() or .add_w() methods.

Most important method is .collides_with() , which gets another transaction as parameter and if other transaction written into set of UUIDs that this trasnsaction have read from or written to, it returns True.

TransactionManager

class TransactionManager:
    tree: NodeTree
    active_transactions: List[Transaction]

    def __init__(self, tree):
        self.tree = tree
        self.active_transactions = []

    def get_transaction(self) -> Transaction:
        t = Transaction(self)
        self.active_transactions.append(t)

        return t

    def validate_and_commit(self, transaction):
        if transaction.invalid:
            self._invalidate_transaction(transaction)

        # transactions that finished in the lifetime of this one
        for finished_transaction in transaction.transactions_to_consider:
            if transaction.collides_with(finished_transaction):
                self._invalidate_transaction(transaction)

        # transactions that begun before this one
        index = self.active_transactions.index(transaction)
        for prior_transaction in self.active_transactions[:index]:
            if transaction.collides_with(prior_transaction):
                self._invalidate_transaction(transaction)  # or invalidate prior?

        self.active_transactions.remove(transaction)
        transaction.finished = True

        self._merge_transaction(transaction)

        # set this as already finished in still running
        for active_transaction in self.active_transactions:
            active_transaction.add_finished_transaction(transaction)

    def _invalidate_transaction(self, transaction):
        transaction.invalidate()
        self.active_transactions.remove(transaction)

        raise TransactionConflict()

    def _merge_transaction(self, transaction):
        self.tree = self.tree._merge_with(transaction.tree)
        self.tree.reindex_node_directory()

TransactionManager provides and tracks transactions. Most interesting method is .validate_and_commit(), which gets a transaction as parameter and checks whether it doesn't conflict with previously finished transactions. Then with actually running transactions, that are older than this one. If there is no conflict, the tree in this transaction is merged to the main tree and transaction is stored in all other transactions, so they would look into it when they are committed.

Read/write tracking

For tracking access to the data structures used in my simple in-memory database, I've created two monstrosities:

class TrackingList(UserList):
    def __init__(self, initlist=None, uuid=None, transaction=None):
        super().__init__(initlist)

        self._uuid = uuid
        self.track = False
        self._transaction = transaction

    def _add_r(self):
        if self._transaction is not None and self.track:
            self._transaction.add_r(self._uuid)

    def _add_w(self):
        if self._transaction is not None and self.track:
            self._transaction.add_w(self._uuid)

    def _log_all_uuids_in(self, item):
        self._add_w()
        if self._transaction and self.track and hasattr(item, "all_uuids"):
            for uuid in item.all_uuids():
                self._transaction.add_w(uuid)

    def __getitem__(self, i):
        self._add_r()
        return super().__getitem__(i)

    def __setitem__(self, i, item):
        if i < len(self.data):
            self._log_all_uuids_in(self.data[i])

        return super().__setitem__(i, item)

    def __delitem__(self, i):
        if i < len(self.data):
            self._log_all_uuids_in(self.data[i])

        return super().__delitem__(i)

    def append(self, item):
        self._add_w()
        return super().append(item)

    def insert(self, i, item):
        self._add_w()
        return super().insert(i, item)

    def pop(self, i=-1):
        self._add_w()
        return super().pop(i)

    def remove(self, item):
        self._add_w()
        return super().remove(item)

    def clear(self):
        self._add_w()
        return super().clear()

    def copy(self):
        self._add_w()
        return super().copy()

    def count(self, item):
        self._add_r()
        return super().count(item)

    def index(self, item, *args):
        self._add_r()
        return super().index(item, *args)

    def reverse(self):
        self._add_w()
        return super().reverse()

    def sort(self, *args, **kwds):
        self._add_w()
        return super().sort(*args, **kwds)

    def extend(self, other):
        self._add_w()
        return super().extend(other)

This is a list, that works as a standard list, but it also tracks all access and stores uuids of the added / changed / removed elements to the transactions.

Second monstrosity is Clonable class, which I won't post here. Basically, it is an ABC class, that implements .__getattribute__() and .__setattr__() which do track access to all properties and store it to the transaction.

Clonable is not explained here, because it also creates partial copies of the trees, which are dynamically loaded from the original tree on the fly. This is to save memory and CPU time by not creating full copy of the tree for each transaction.

Merge

Merge could be implemented simply by replacing the original tree with updated one, but I've wanted to keep as much as possible from the original tree. This is because of the references other transactions / parts of the system may use, which would be suddenly invalid.

Thus, it ended up being a little too complicated recursive replacement of only those parts of the tree, that really changed, but there is nothing really interesting about it. Just recursively call ._merge() for the tree, and if it is new item, use it, and if its old, changed, item, merge it.

Become a Patron