Saturday, 22 July 2017

Graph Databases and Neural Nets III

So, after a lot of pain and suffering, I have worked out most of the issues with this system.  The tricky part, of course, was neuron execution.  Here is the problem.  If I execute a neuron's code in this system, that code takes the object the neuron is attached to and an options object as arguments.  The code then returns a potentially modified options object.  I could go into a long explanation for this, but I won't.  Suffices to say, I wanted to maintain some sort of state for the results so that as more and more neurons fired, they would get this options object and be able to make decisions based on its values.  Now, when a neuron fires, it basically runs its own code, then it looks through all of its connections for other neurons to send the data to so they can fire and so on.  Now, I know you are saying, "Couldn't that go on forever if there are loops in the network?"  Yes, it could, but the system allows max depth to be set.  Anyway, the neuron essentially finds other neurons and those run, but they run in series and because options potentially changes each time a neuron fires, in this state of things, I do not pass the same options each time I send to another neuron.  What does that mean?

Let's say I execute neuron 1.  Neuron 1 is connected to neuron 2 and neuron 3.  So, now, I pass the modified options object from the neuron 1 fire to neuron 2 and it modifies the object and for now let's say it isn't connected to anything else, so we drop down to neuron 3 which has the options object that was modified by neuron 2.  Now, what if I wanted to pass the same options object that came from the neuron 1 fire to both neuron 2 and neuron 3?  I might want to do that, right?  In fact, I might want to do that more often than not depending on the type of data etc. I am dealing with.  So, I think I just have to make this an option (the default option.)  I will get around to it eventually.

Another issue I have now is that I don't have a formal output process for neurons.  So, right now, they just follow connections until they run out of connections or hit their max depth.  What happens after that?  Well, technically, the options object is returned at the end of the run, so you can do what you want with that.  However, it seems that there is a case here for exit procedures of some sort.  I fire a neuron and this cascades to some point and then I do SOMETHING with the options object at the end of the run.  Now, that can just be baked into whatever code you write that uses this module, or it could be something that is a part of the module itself.  I don't know.  Honestly, I am beginning to believe that I have gone completely off the rails.  I am questioning everything.  Where have I gone wrong?  I was happy once.  I had friends (not really.)  I believed I could do anything (mania.)  But now, I am lost.

Whatever...

The graph database part of this has reached a point where the only thing left to do is allow advanced key queries.  For example, queries like "node->{key}->{some key} > 10" or "node->{key}->{some key} in ['a','b','c']"  Well, sub-queries would also be nice, but then you get into nasty recursion and I really don't have the patience for that right now.  I'll leave that to the other poor suckers I've dragged into this nightmare.  Yeah, I said it.  Nightmare.  This is the worst thing that has happened to me...EVER.  Not really.  It is the second worst thing.  The worst thing was when I got a cluster of ticks on a rather sensitive part of my body while on a camping trip. 

So, lots of progress and even better is that I have this working in strict mode in perl.  If you don't program perl, don't worry about that.

So, I have brought one other person into this project and am trying to get another in on it.  The first is a C programmer who thinks this is absolute lunacy.  The second is a perl programmer who taught me how to program in perl.  There is another that I am tempted to bring in, but early probing didn't really generate a lot of interest, so whatever.

What are the goals at this point?
1) Create a perl module that can be easily used to create in memory graph databases with neural net hooks.
2) Create a C version of this with more advanced features including threads, multi-homed, and persistence.
3) Create a Lisp version of this system....just because.  I mean, who wouldn't want a lisp version of this?
4) Find a new psychiatrist and get on the right meds.


Sunday, 25 June 2017

Graph Databases and Neural Nets II

It works.  Kind of in shock right now because what I have written is pretty abstract and I have had a difficult time really understanding it.  It works.

It works as a graph database.  I tested it using an unstructured data labeling problem with great results.  Basically, I loaded the IMDB database into the system, breaking titles into ngrams and storing the titles in nodes and the ngrams in nodes and then connecting the ngrams to the titles.  Each ngram connection to a title had a score based on the size and position of the ngram within the title.  Then I took a file containing 3000 filenames, all of which had something to do with movies or tv, and I tried to assign an IMDB title to each filename.  Now, the filenames were terrible, absolutely terrible.  There were misspellings, abbreviations, and a lot of garbage in the names, so my accuracy suffered, but out of 3000 filenames, I was able to accurately label 1800 of them or so.  That was just using connection queries to find aggregate scores for ngram matches to titles.  Not that bad, but with some work, like adding in spelling correction and dealing with abbreviations, this could be much better.  Unfortunately, this is not terribly fast.  I have written another solution for this problem that is much faster.  Still, a proof of concept.

It works as a neural net/graph database.  I tested it by loading a book into the system, dividing it into nodes representing words, parts of speech and then connections between words and between parts of speech.  The neurons for this were neurons that handled different parts of speech and recursively hunted for "next words" to form sentences.  when one neuron finished it would send its output to a series of other neurons that were connected by virtue of the fact that the nodes the neurons were tied to were connected to other nodes that had neurons....wonky, yes, but it is roughly correct.  When a neuron run hit the iteration limit, the results object that was passed from neuron to neuron was processed and a sentence was formed.  That sentence was "I am not a moron, Kev."  Punctuation was part of the system, so it did put in the comma.  That was the first sentence.  The sentences that followed were not so great, which was what I expected, but getting that first one really made my day.  I should note, that that sentence did not exist in the training text, so it was completely created by the neural net.

This system is actually just a perl Module that you can use and customize as you see fit.  It is not a lot of code, but will grow a bit as I add in better support for connection and node queries.  The two examples each used this library.  The labeler was about 100 lines of code.  The book reader was about 400 lines of code.  Perl is a powerful language.  It is horribly messy if you aren't on top of things, but is really really good.  The real downside is that it isn't terribly fast.  I want to translate this to another language.  Have thought about C, but the reality there is, as a friend of mine likes to say, "It is like building a house with tweezers and toothpicks."  Lisp is another possibility, but I am not terribly proficient in Lisp and I'm not really sure it would perform all that well anyway....Not sure.  Of course, there is Java, but I haven't given up on life quite yet.

Ultimately, this should be a system that can be distributed.  Also, right now, it doesn't save anything to disk, so when the program exits, you lose everything.  I'll get around to dealing with that eventually.

Anyway, the thing works and while it isn't terribly fast, it is pretty powerful.  All the guys at my job make fun of me for being so in love with perl.  They tell me I should be living in a hippy bus.  I haven't told them yet that I actually do live in a hippy bus.


Tuesday, 13 June 2017

Graph Databases and Neural Nets

So, for me, a very lazy person, getting a computer program to write books for me is the holy grail.  So, I spend a good amount of time writing programs that will write books.  So far, my work in this area has not produced anything of substance.

Recently, I started playing with graph databases.  In particular, I chose neo4j for my experiments.  Now for those of you who don't know anything about databases, this will make little sense to you.  If we look at two different types of databases, relational and graph, we see big differences.  Relational databases organize data in tables which have rows of columns.  Graph databases store data in nodes that can be connected.  These nodes contain information, data.  The connections can also store information about the type of connection.  So, an example.

In a relational database, I might have a "users" table that contains data on the users for some system.  That might have the columns: username, first name, last name, password.  So, for each user in the table, there is a row.  Now I might also have a table in there called "books" which stores books associated with users (fair warning, this is not going to be what I would call good database design.)  So, the columns in books might be: username, title, pages, genre.  So, in this case, username in "books" refers to a username in "users."  Thus columns in books are "related" to columns in "users."

In a graph database, I might have a type of node I call a user node and I might put all of the characteristics of my users in nodes of this type.  Then I might have nodes of type book that have the characteristics of books in them, but NOT a reference to users within the book nodes.  Then, I can connect user nodes to book nodes and assign values to the connection.  For instance, I might connect user "mark" to book "Kev" and set one of the properties of the connection to be "author."  I might have another user "sheila" connected to "Kev" with property "critic."

Very exciting stuff, and if you are a geek like me, you will probably immediately see how graph databases could be useful for a variety of things (but not all things...trust me.  I've tried a bunch of stuff and some of it is just way too painful to deal with.)

Anyway, one of my issues with neo4j was the speed at which it allowed my to insert nodes and create connections.  Given that I didn't really need something that fancy and also given that I had a mad idea of merging neural nets and graph databases, I got rid of neo4j and wrote my own in-memory graph database using Perl...  Yeah, I know.  Perl.  Look, if you are really comfortable with a language and can write code quickly with it, then you will likely use that language for proofs of concept.  Further, perl has some nice features that allow rapid development of this sort of thing.  Should I ultimately move this to C or C++?  Yes.  But for now, I just want to get it working.

So, graph databases and neural nets.  Why the hell would I want to merge those two things?  Well, to understand that, you probably need a basic understanding of neural nets.  I am going to very briefly describe them and you can do more research if you are so inclined.

Neural Nets are computer science's attempt to model the brain with code, or are one attempt at that.  There are three main components of a neural net: Inputs, Neurons, Output.  Now, inputs can be things like data from files or databases or whatnot, or can also be outputs from neurons.  So, basically, you can have networks of neurons taking input from a variety of sources, including each other.  Yay, that's great!  So, uh, why is that interesting?  Well, the neurons use algorithms to basically react to the data they are fed.  These algorithms create the output that goes wherever it goes.  Now, if you aren't seeing the beginning of a connection between graph databases and neural nets, then start seeing.  Basically, if a graph database node was the equivalent of a neuron in a neural network, with the added benefit of being able to store data, data that could change, data that could impact the functioning of the algorithm and possibly even alter the topology of the network as needed, then you might have a powerful tool for analyzing data and perhaps even creating an "intelligent" system.  I see this configuration as a neuron with a memory.

On the one hand, you have the database aspect, so you can query data and see relationships etc. just like in a normal graph database.  On the other, you have the neural net aspect that gives the database the ability to react to the data and make decisions on the structure of the entire network.  So, your database is kind of self aware.

Ok.  So, all that said, I first created a basic graph database (in-memory as opposed to storing data on disk) that allowed creation of nodes and connections and allowing the user to set properties for these nodes and connections at the time of creation or any  time thereafter.  It also has a basic query mechanism for finding individual nodes or finding connected nodes.  At present, I can insert 9 million nodes and create 90 million connections in about a minute, which is okay, but not exactly stellar.  If I had the added burden of disk IO it would slow down dramatically.  But, my computer has 64GB of RAM, so not going to deal with disk crap at this point.

Now, neural nets come into the picture, but how?  Well, remember that a neuron has some sort of algorithm associated with it (possibly more than one, but we will get to that in a bit.)  So, if a node is a neuron, I needed a way of associating the algorithm (basically a piece of code) with the node, in fact embedding it in the node.  Further, because I like systems that are as dynamic as possible, I want to be able to change algorithms within nodes on the fly.  So, I had a problem figuring out how I was going to do this with perl the way I wanted to, but I found a way that I really don't like.  Honestly, what I really wanted was the ability to have the perl code modify itself while running.  In fact, I wanted the perl code to be able to generate code, but that is out of reach I think.  I guess lisp could handle this, but I just don't have the patience for lisp (beautiful language, but a tough one for me.)

So, I have nodes/neurons that have data and have algorithms now.  I need something to make the algorithms react to the data.  Now I am running single threaded perl (I refuse to use the pthreads perl because I can't wrap my puny brain around it) so, I technically have to have one neuron/node fire at a time and have its output then go to all of the other neurons/nodes it is connected to or to output if it is that type of neuron/node.  Now if you have 9 million nodes with 90 million connections, you can see that this is not going to be all that fast, which is why this needs to be written in C and run on a supercomputer, but whatever, I'm not doing that because I don't have a supercomputer.  So, here is the plan.  There are a variety of node types and lets say I have nodes that are essentially trigger nodes or input nodes whose values trigger execution across the network.  So, I activate those nodes in some order and that propagates through the system until the "run" finishes and then some other event starts the process up again, or maybe the system, once started, just keeps running...  Not done yet and I don't really have this part sorted out, but I think I am on the right track.

My primary test case is an NLP test case wherein I analyze books that are brought into the database as connected ngrams and so forth.  Somehow, I want to get this sucker to generate language.  So, quite a way to go, although much of the coding for the backend is done.  I'll write more as I get further along.