Friday, 4 August 2017

Graph Databases and Neural Nets IV

Writing a query engine for a database seems pretty simple on the surface.  However, when you really dig into it, it sucks and you start to question why you are writing a query engine.  I mean, don't you have better things to do?  Don't you have a family that is wondering why you are locked away in the attic in the dark, mumbling to yourself, occasionally screaming, and more occasionally throwing and breaking things?  Whatever.

The basic problem here is writing a query engine that allows for a wide array of conditions.  Now, for my engine, I wanted to provide at bare minimum the ability to query exact values,  >, <, >=, <=, IN, NOT IN and KEY EXISTS.  Sounds pretty simple, right?  How hard could that be?  Well, doing them individually is actually very very simple.  But, when you have complex queries against multiple keys and also add in the ability to use AND and OR for combinations of conditions, it starts to get pretty ugly.

I have done it, of course, and it works, but it took two tries to get it working and now, it is a horrible mess and needs to be completely rewritten and optimized and that is going to lead to errors and screaming and throwing and breaking, but I will get it working and working well.

Once that is working, I will move on to things like DATES, REGULAR EXPRESSIONS and then the ability to embed function calls within the queries (WHY?  BECAUSE)

This is a slow process.  It is slow because I am only allowing myself to take small steps between tests.  Tests take time and patience (and an understanding of what the hell needs to be tested, which I often do not entirely have.)  So, baby steps.  This approach has led to fewer bugs overall.  However, it goes against my natural inclination to just barrel through things to "get it done."  Frustrating.  Very frustrating, but it is paying off.

So, here is what is working now:
  1. Creation of networks, nodes, connections and neurons
  2. Modification of networks, nodes, connections and neurons
  3. Ability to find connections for any object (node, connection, neuron)
    1. ==, >, <, >=, <=, IN, NOT IN, KEY EXISTS queries working and combinable
    2. AND and OR allowed but only globally for a query
  4. Indexes for object names, types and keys
  5. Neuron execution (Recursive, traverses network, can be slow.)
  6. Path analysis. (This is pretty slow for complex networks if depth > 2)
  7. Balance analysis (relationship between incoming and outgoing connections, a value between -1 and 1)
  8. Common connection queries (given a group of objects what connections are common to them?)  This allows for strict or fuzzy matching and is really just a way of clustering data.
  9. Deep connections.  A depth limited (configurable) search for connections to a given object
It isn't really a ton of code.  There are four files (soon to be more when I break out the query stuff into its own library.)  The code is clean except for the query stuff.  Variable naming is pretty solid.  I follow a set of patterns to keep it from becoming an unreadable nightmare.  So, in pretty good shape so far, but there is so much more to do.  I'm probably less than 30% done with it.