The Structr GraphGist Importer
tl;dr: Our contribution to Neo4j's GraphGist winter challenge: A tool to import and parse a GraphGist and create a schema meta graph for self-describing Neo4j databases in Structr.
Here's the full story:
Neo4j's GraphGist Challenge
When the Neo4j team announced their recent GraphGist challenge we asked ourselves "what can we contribute?". As very committed, long-term Neo4j community members, we really wanted to add something cool and non-trivial.
In the beginning, we had no real idea what to do. The internal data model of Structr's CMS part is basically the DOM tree, and not a good example because it's too boring (nodes contain other nodes and have nodes as siblings). Then, our focus is not so much on a specific domain or data model as Structr is a very generic platform or tool. After seeing the first submissions it became clear that it would be really hard, if not impossible, to beat those.
Structr's Schema Meta Graph
At the same time, one of the latest improvements we added to Structr was the schema meta graph: A set of special schema-nodes and -relationships describing the domain model of data in Neo4j, making it a self-describing database. You can use the UI to create and modify the meta schema by creating type nodes, adding property definitions, then connecting type nodes with relationships describing the relation between those types.
That was already something we thought people would find useful. But wouldn't it be much cooler to just automatically create the schema from an existing data set like ... wait ... a GraphGist?? YEAH!
We were highly motivated by the growing number and the broad spectrum of GraphGists, most of them of an incredible depth and quality, that we decided to build a GraphGist Importer and include it into the long-awaited 1.0 release of Structr.
With such a great pool of test cases at hand, it was real fun to develop the tool, and feeding one GraphGist after the other, while improving the auto-detection algorithms to make sure that Structr can process and understand at least the data model of the most GrapGist. Our long-term goal is of course to auto-detect any graph model.
We learned some very interesting things about how different people model their domains. There's no "do-it-so-or-it's-wrong", maybe rather some helpful caveats and best-practices. We think it's good to have a flexible model in the persistence layer with only optional constraints, and put a more strict schema control into the application layer.
To determine the type of a node, people used one of the following things:
- "type" property
- Type nodes
- different property sets
- a mix of the above
For simplicity, we just refer to the node type here; in most models, the type of the relation in the schema can be safely assumed as Neo4j's relationship type.
The most common way we found in the GraphGists to define a node's type was by using a label. This works with Neo4j 2.0+, and it's probably the best way as it makes use of the new label functionality in Neo4j 2.0 including all benefits like auto-indexing, simpler/better Cypher queries etc..
Some use a "type" String property with an type identifier in it. This works with all Neo4j versions, and is similar to the way Structr and many other frameworks did and still do it to persist type information.
In some cases we found another pattern: Many similar nodes connected by relationships of the same type to a single node which holds the type information. Not always easy to autodetect (single relationships can be mis-interpreted as type relationships, leading to false positive, "ghost" types).
Another, uncommon pattern is to differentiate the types solely by the difference in their set of properties stored on a node. This requires a strict and uniform assignment of node properties, sometimes a bit tricky to detect in case of null values (null or empty values have to be marked with a special property value to be detected as such, and not mis-interpreted as a non-existing field, leading to another type).
Some GraphGists don't use a clear pattern for node types. While still being a valid graph in Neo4j (as Neo4j, by its "schema flexible" nature, doesn't force users to use schema elements), it is very difficult to make sense of those graphs in terms of a reliable schema.
GraphGists are awesome!
It's a perfect way to get people started playing with domain-specific data and understand the data model. It's so simple to create an initial dataset in a Neo4j database and let queries run against the data and see the graph visualization.
And never has it been easier to create a beautiful web or mobile app including a ready-to-use RESTful backend by just importing the GrapGist into a Structr instance! We even think about creating an even more sophisticated importer which automatically creates pages including navigation, lists/tables and forms out of a graph gist.
And today I spent an hour or so to built this little site: Single Malt Scotch Whisky Database. I'll add more queries whenever I find some time. Have fun!