Tuesday, September 25, 2018
Tuesday, September 1, 2015
Fixing Japanese Candlestick Charting
What is it:
- A way that investors visualize and interpret open, high, low, and close (ohlc) price data and identify patterns.
- There are a number of different patterns that can occur individually or in a short sequence that are expected to presage a large up or down moves.
Flaws:
There are a number of different flaws the most obvious is the lack of proper definitions. Candlestick patterns are often defined with names like doji and subtypes of neutral, long-legged, gravestone, and dragonfly. In researching these name, I found they are defined with pictures... meaning you'll know it when you see it.
But that's the problem, when is a pattern more like one than another?
Will I really know it when I see it?
If I need to find these patterns by visual search, I have no real way to determine the effecacy of invest with this method, because these 'signals' really become a matter of how well an investor interprets market patterns, which may be more than just these chart patterns.
In order to test hypotheses using candlestick patterns concrete definitions for those patterns are needed.
This is an unsupervised learning problem.
We don't actually know what patterns exist in the ohlc data. Patterns like dojis may have been created looking for big moves, then trying to find similarities in the patterns that preceded it. However, whether we have an outcome from the pattern or not, we need to find similar patterns in ohlc data.
There are a number of different algorithms that can find similarities mostly by calculating the distance between vectors and the average of each cluster. Depending on the algorithm, the user may need to supply the number of clusters and perhaps use the elbow method to determine that number accurately.
In my own calculations, I discovered more than 50 different candlestick patterns by calculating a number of different market measures and normalizing the changes in stock prices.
Friday, August 1, 2014
Mike Schmidt - Is Eureqa a genetic algorithm?
Just saw Michael Schmidt speak at Pivotal Labs about Eureqa.
His presentation was very similar to this one at TEDx.
It was an interesting discussion about his algorithm which tries to distill understanding out of data not just accurate and mystical prediction as most machine learning algorithms do. In other words rather than hiding the prediction behind a trained black box, it seeks to reveal the true features and formulas that transform you data parameters from x to y. For instance, is formula y = sin(x), y = cos(x), or y = x^2. These transforms your x parameter are feature generation and ordinarily it can be a difficult skill to master, but Eureqa seems to do it with ease.
How?
He showed a number of slides that resemble a decision tree with the nodes being +, -, /, * and various other transformations but such a process does not seem to have an implicit feedback loop to tell you whether you were right or wrong.
He also stressed that processing power these days makes it possible, so it is very computationally intensive.
"The search space for equations is infinite."
"The approach that works very well is based on natural selection, particularly darwin evolution."
He is using a genetic algorithm to generate a plethora of formulas which he then tests for accuracy against the data. He would go through a process of kill off the formulas with the weakest predictive quality and cross pollenating others at random.
Another point he stressed today and in the video is that he focused on what is not changing in the data and how is was challenging to find the most simple non-trivial formulas that describe those rules. He uses the concept of the Pareto Frontier for this.
Just a guess, but I would imagine that he would compare and possibly cluster on the most successful formulas to reveal those fundamental rules.
His presentation was very similar to this one at TEDx.
It was an interesting discussion about his algorithm which tries to distill understanding out of data not just accurate and mystical prediction as most machine learning algorithms do. In other words rather than hiding the prediction behind a trained black box, it seeks to reveal the true features and formulas that transform you data parameters from x to y. For instance, is formula y = sin(x), y = cos(x), or y = x^2. These transforms your x parameter are feature generation and ordinarily it can be a difficult skill to master, but Eureqa seems to do it with ease.
How?
He showed a number of slides that resemble a decision tree with the nodes being +, -, /, * and various other transformations but such a process does not seem to have an implicit feedback loop to tell you whether you were right or wrong.
He also stressed that processing power these days makes it possible, so it is very computationally intensive.
"The search space for equations is infinite."
"The approach that works very well is based on natural selection, particularly darwin evolution."
He is using a genetic algorithm to generate a plethora of formulas which he then tests for accuracy against the data. He would go through a process of kill off the formulas with the weakest predictive quality and cross pollenating others at random.
Another point he stressed today and in the video is that he focused on what is not changing in the data and how is was challenging to find the most simple non-trivial formulas that describe those rules. He uses the concept of the Pareto Frontier for this.
Just a guess, but I would imagine that he would compare and possibly cluster on the most successful formulas to reveal those fundamental rules.
Saturday, March 1, 2014
BeautifulSoup - Cheat Sheet
- parse HTML by default, can parse XML
Modules to Import:
- BeautifulSoup
- CData
- ProcessingInstruction
- Declaration
- DocType
Basic Commands:
import urllib2
from bs4 import BeautifulSoup
from bs4 import BeautifulSoup
# use the line below to down load a webpage
html = urllib2.urlopen('web address').read()
soup = BeautifulSoup(open(doc.html))
soup.prettify() => read html
soup.get_text() => all text
soup.get_text(‘|’, strip=True) => all text as unicode, separate tags with |, remove line breaks
Search
.find()
- ('tag', {'attr' : 'value'})
.find_all()
- string
- string, string
- attr = ‘’text”
- attrs={"data-foo": "value"}
- regex
- list
- true => all tags
----
import re
for tag in soup.find_all(re.compile("t")):
print(tag.name)
----
def has_class_but_no_id(tag):
return tag.has_attr('class') and not tag.has_attr('id’)
soup.find_all(has_class_but_no_id)
----
Navigation
.a_tag.b_tag => get first b_tag in a_tag
.contents => list
.strings => text from the doc
.stripped_strings => text w/o line breaks
.children
.parent
.next_element => different then children
.previous_element
.next_sibling
.previous_sibling
Interables
.decendents
.parents
.next_elements
.previous_elements
.next_siblings
.previous_siblings
BeautifulSoup Objects
Main
- Tag
- NavigableString
- BeautifulSoup
- Comment
Lesser - all subclass NavigableString
- CData
- ProcessingInstruction
- Declaration
- DocType
Tag Object
.tag => first tag
.tag.name => tag
.tag.string => text
.tag.get(‘attr’) => use if you don’t know if tag is defined
.tag attr are in a dictionary
- multivalued tag attributes => list
- multivalued tag attributes - class, rev, accept-charset, headers, accesskey
- ‘id’ is not multivalued => string
- you can change tag attributes
NavigableString Object
- tag.string => text within a string
- tag.string.replace("any_text”)
- use outside of BeautifulSoup by converting to unicode
- unicode(tag.string)
- supports all navigation except .contents .string .find()
BeautifulSoup Object
- whole document
- soup.name => u’[document]’
- supports most navigation
Comment Object
- NavigableString subclass
- <!— text -->
- display with special formatting when prettified