Insights into what moves us. Contributions from the Structr team and guests. The Structr Blog

Axel Morgner
16. May 2014

About the Robustness of Neo4j

TL;DR

In my 4-year-experience using and developing apps with it, I can state that Neo4j is a very reliable database with an impressively robust crash recovery, and even in the rare case of invalid records, you can easily fix it.

The Details

This is a topic I wanted to blog about for a long time. I started playing around with Neo4j in 2010, shortly after the 1.0 release in Feb. 2010, created first applications and started the development of Structr.

The first project, a hotel database site, went into production in 2011, running on Structr 0.3, with Neo4j 1.2-SNAPSHOT under the hood. You might think "isn't using a SNAPSHOT build in production crazy?", and normally this is true. But you also get a gut feeling when working with a software intensively enough, giving you a good impression about its stability, and I had confidence even with the SNAPSHOT versions of Neo4j. And I crashed it a lot during these days. :-)

Reliable Recovery is the Key

From the first moments of using Neo4j, it was not the robustness of Neo4j itself, running in normal operation mode (clean shutdown, normal startup). That was something I expected to work flawlessly from an ACID-transactional database which was used in production for > 10 years. The really impressive part of the robustness of Neo4j was the reliability of the recovery from unclean shutdown or crashes.

In fact, we ran all our embedded Neo4j instances in development mode without clean shutdown for a long while because it was simply not necessary to shut it down cleanly when the recovery is so reliable. Today, of course, with larger databases in general, and more important data in it, we don't do it intentionally anymore, but even when Murphy's law strikes sometimes, we're relaxed.

Worst Case Options

One interesting fact about Neo4j is that, in the rare case of a quick recovery failing, as a last option, you can still continue to run your application even with some invalid records in it (kids, don't try that at home!). As long as you don't touch these records while traversing your graph, there's a pretty good chance everything will still work fine. You could f.e. disable certain functions in your application or limit queries to the unharmed parts of your database and continue until you have a permanent fix for the root problem. Of course, you should never take a situation like this lightly, but it might give you an extra option to consider and gain time.

Repair

In our history with Neo4j during the last four years, we had only two databases (of a couple of hundreds) with corrupt records. Both issues were caused by a non-clean shutdown related to our own mistakes. In one case, a commercial project, the brilliant people of Neo Technology's enterprise support analyzed the db and created a fix in form of a transaction log, specially crafted to make some low-level changes to the database's persistence fixing the issues when applied. The issue was fixed within hours, no damage or data loss at all.

In the other case, a community project, we could fix the issue with Michael Hunger's awesome store-utils [1], namely the StoreCopy tool, which transformed a 3.5 GB database (including large String indexes) into a fresh 1.6 GB database, simply leaving out unused and invalid records, within 56 s.

Head of the messages.log of our oldest production database

Thu Jan 20 01:09:33 CET 2011: Creating new db @ /tmp/neo4j-copy/neostore
Thu Jan 20 01:09:33 CET 2011: Opened [/tmp/neo4j-copy/nioneo_logical.log.1] clean empty log, version=0
Thu Jan 20 01:09:33 CET 2011: Opened [/tmp/neo4j-copy/lucene/lucene.log.1] clean empty log, version=0
Thu Jan 20 01:09:33 CET 2011: Opened [/tmp/neo4j-copy/lucene-fulltext/lucene.log.1] clean empty log
Thu Jan 20 01:09:33 CET 2011: Opened [/tmp/neo4j-copy/index/lucene.log.1] clean empty log, version=0
Thu Jan 20 01:09:33 CET 2011: Extension org.neo4j.graphdb.index.IndexProvider[lucene] initialized ok
Thu Jan 20 01:09:33 CET 2011: Extension org.neo4j.graphdb.index.IndexProvider[spatial] initialized ok
Thu Jan 20 01:09:33 CET 2011: Extension org.neo4j.kernel.KernelExtension[shell] initialized ok
Thu Jan 20 01:09:33 CET 2011: TM new log: tm_tx_log.1
Thu Jan 20 01:09:33 CET 2011: --- CONFIGURATION START ---
Thu Jan 20 01:09:33 CET 2011: Physical mem: 24161MB, Heap size: 1962MB
Thu Jan 20 01:09:33 CET 2011: Kernel version: Neo4j - Graph Database Kernel 1.2-SNAPSHOT (revision: 8112
Thu Jan 20 01:09:33 CET 2011: Operating System: Linux; version: 2.6.26-2-amd64; arch: amd64; cpus: 8
Thu Jan 20 01:09:33 CET 2011: VM Name: OpenJDK 64-Bit Server VM
Thu Jan 20 01:09:33 CET 2011: VM Vendor: Sun Microsystems Inc.
Thu Jan 20 01:09:33 CET 2011: VM Version: 1.6.0_0-b11

So I'm really happy with the robustness of Neo4j and its behaviour under stress conditions, allowing us to concentrate on our development process, our users and our business instead of fiddling around with data consistency issues.

Axel

[1] https://github.com/jexp/store-utils