Friday, August 1, 2014

Mike Schmidt - Is Eureqa a genetic algorithm?

3:33 AM Posted by Sini 41 comments
Just saw Michael Schmidt speak at Pivotal Labs about Eureqa.

His presentation was very similar to this one at TEDx.

It was an interesting discussion about his algorithm which tries to distill understanding out of data not just accurate and mystical prediction as most machine learning algorithms do.  In other words rather than hiding the prediction behind a trained black box, it seeks to reveal the true features and formulas that transform you data parameters from x to y.  For instance, is formula y = sin(x), y = cos(x), or y = x^2.  These transforms your x parameter are feature generation and ordinarily it can be a difficult skill to master, but Eureqa seems to do it with ease.

How?

He showed a number of slides that resemble a decision tree with the nodes being +, -, /, * and various other transformations but such a process does not seem to have an implicit feedback loop to tell you whether you were right or wrong.

He also stressed that processing power these days makes it possible, so it is very computationally intensive.

"The search space for equations is infinite."
"The approach that works very well is based on natural selection, particularly darwin evolution."

He is using a genetic algorithm to generate a plethora of formulas which he then tests for accuracy against the data.  He would go through a process of kill off the formulas with the weakest predictive quality and cross pollenating others at random.

Another point he stressed today and in the video is that he focused on what is not changing in the data and how is was challenging to find the most simple non-trivial formulas that describe those rules.  He uses the concept of the Pareto Frontier for this.

Just a guess, but I would imagine that he would compare and possibly cluster on the most successful formulas to reveal those fundamental rules.