You, Kill Johnny?!

Tuesday, September 25, 2018

BA (Boeing) Stock Analysis

1:57 PM Posted by Sini 1 comment

Fixing Japanese Candlestick Charting

8:09 AM Posted by Sini 9 comments

What is it:

A way that investors visualize and interpret open, high, low, and close (ohlc) price data and identify patterns.

Why:

There are a number of different patterns that can occur individually or in a short sequence that are expected to presage a large up or down moves.

Flaws:

There are a number of different flaws the most obvious is the lack of proper definitions. Candlestick patterns are often defined with names like doji and subtypes of neutral, long-legged, gravestone, and dragonfly. In researching these name, I found they are defined with pictures... meaning you'll know it when you see it.

But that's the problem, when is a pattern more like one than another?

Will I really know it when I see it?

If I need to find these patterns by visual search, I have no real way to determine the effecacy of invest with this method, because these 'signals' really become a matter of how well an investor interprets market patterns, which may be more than just these chart patterns.

In order to test hypotheses using candlestick patterns concrete definitions for those patterns are needed.

This is an unsupervised learning problem.

We don't actually know what patterns exist in the ohlc data. Patterns like dojis may have been created looking for big moves, then trying to find similarities in the patterns that preceded it. However, whether we have an outcome from the pattern or not, we need to find similar patterns in ohlc data.

There are a number of different algorithms that can find similarities mostly by calculating the distance between vectors and the average of each cluster. Depending on the algorithm, the user may need to supply the number of clusters and perhaps use the elbow method to determine that number accurately.

In my own calculations, I discovered more than 50 different candlestick patterns by calculating a number of different market measures and normalizing the changes in stock prices.

Mike Schmidt - Is Eureqa a genetic algorithm?

3:33 AM Posted by Sini 41 comments

Just saw Michael Schmidt speak at Pivotal Labs about Eureqa.

His presentation was very similar to this one at TEDx.

It was an interesting discussion about his algorithm which tries to distill understanding out of data not just accurate and mystical prediction as most machine learning algorithms do. In other words rather than hiding the prediction behind a trained black box, it seeks to reveal the true features and formulas that transform you data parameters from x to y. For instance, is formula y = sin(x), y = cos(x), or y = x^2. These transforms your x parameter are feature generation and ordinarily it can be a difficult skill to master, but Eureqa seems to do it with ease.

How?

He showed a number of slides that resemble a decision tree with the nodes being +, -, /, * and various other transformations but such a process does not seem to have an implicit feedback loop to tell you whether you were right or wrong.

He also stressed that processing power these days makes it possible, so it is very computationally intensive.

"The search space for equations is infinite."
"The approach that works very well is based on natural selection, particularly darwin evolution."

He is using a genetic algorithm to generate a plethora of formulas which he then tests for accuracy against the data. He would go through a process of kill off the formulas with the weakest predictive quality and cross pollenating others at random.

Another point he stressed today and in the video is that he focused on what is not changing in the data and how is was challenging to find the most simple non-trivial formulas that describe those rules. He uses the concept of the Pareto Frontier for this.

Just a guess, but I would imagine that he would compare and possibly cluster on the most successful formulas to reveal those fundamental rules.

BeautifulSoup - Cheat Sheet

9:02 PM Posted by Sini 4 comments

BeautifulSoup - cheat sheet

parse HTML by default, can parse XML

Modules to Import:

BeautifulSoup
CData
ProcessingInstruction
Declaration
DocType

Basic Commands:

import urllib2
from bs4 import BeautifulSoup

# use the line below to down load a webpage
html = urllib2.urlopen('web address').read()

soup = BeautifulSoup(open(doc.html))

soup.prettify() => read html

soup.get_text() => all text

soup.get_text(‘|’, strip=True) => all text as unicode, separate tags with |, remove line breaks

Search

.find()

('tag', {'attr' : 'value'})

.find_all()

string
string, string
attr = ‘’text”
attrs={"data-foo": "value"}
regex
list
true => all tags

----

import re

for tag in soup.find_all(re.compile("t")):

print(tag.name)

----

def has_class_but_no_id(tag):

return tag.has_attr('class') and not tag.has_attr('id’)

soup.find_all(has_class_but_no_id)

----

Navigation

.a_tag.b_tag => get first b_tag in a_tag

.contents => list

.strings => text from the doc

.stripped_strings => text w/o line breaks

.children

.parent

.next_element => different then children

.previous_element

.next_sibling

.previous_sibling

Interables

.decendents

.parents

.next_elements

.previous_elements

.next_siblings

.previous_siblings

BeautifulSoup Objects

Main

Tag
NavigableString
BeautifulSoup
Comment

Lesser - all subclass NavigableString

CData
ProcessingInstruction
Declaration
DocType

Tag Object

.tag => first tag

.tag.name => tag

.tag.string => text

.tag.get(‘attr’) => use if you don’t know if tag is defined

.tag attr are in a dictionary

multivalued tag attributes => list
multivalued tag attributes - class, rev, accept-charset, headers, accesskey
‘id’ is not multivalued => string
you can change tag attributes

NavigableString Object

tag.string => text within a string
tag.string.replace("any_text”)
use outside of BeautifulSoup by converting to unicode
unicode(tag.string)
supports all navigation except .contents .string .find()

BeautifulSoup Object

whole document
soup.name => u’[document]’
supports most navigation

Comment Object

NavigableString subclass
<!— text -->
display with special formatting when prettified

You, Kill Johnny?!

Tuesday, September 25, 2018

BA (Boeing) Stock Analysis

Tuesday, September 1, 2015

Fixing Japanese Candlestick Charting

Friday, August 1, 2014

Mike Schmidt - Is Eureqa a genetic algorithm?

Saturday, March 1, 2014

BeautifulSoup - Cheat Sheet