Usually in ML I see input data scaled by subtracting the mean and dividing by the standard deviation to ensure that all features are evenly distributed between -1/1.
I'm trying to figure out if or how to do this for feeding price data into an LSTM network. Specifically:
I presume it's important that whatever values I scale thet training data by must be consistent between all training batches and the live data being predicted on (eg. I need to pick a fixed number, I can't compute the mean/std from the current batch and/or live input data)
Many of my fields are related – for example I have a field that is the price delta from the previous step (
data["Close"].diff()in pandas) and others that are the high/low relative to the close (
data["High"] - data["Close"]). Scaling these fields individually seems like it would ruin important data (eg. subtracting the mean could change the sign of the high/low deltas!?).
What are some options here? (note: I'm relatively new to ML and very new to trading, so ELI5 ;-))
Here's an example of my data (DayOfWeek and Minutes, which I scaled the usual way – I'm including them in the hope that if there are patterns based on times/days this might help learn them). Delta values are in pips (distance from the close) preseving the sign (eg. HighDelay is always >= 0, LowDelta always <= 0 etc.). FutureDelta is the delta to a price 2 bars ahead (I feed this into LSTM with the last 10 time slices).
DayOfWeek Minutes PriceDelta HighDelta LowDelta RollingMean30Delta RollingMean50Delta RollingMean100Delta FutureDelta 99 -1.463178 -0.917445 -3.1 3.9 -0.2 -7.186667 -9.008 -10.530 -2.2 100 -1.463178 -0.904834 0.4 0.8 -0.9 -7.230000 -9.126 -10.718 -0.6 101 -1.463178 -0.892223 -0.9 0.9 -1.7 -5.886667 -7.938 -9.661 0.1 102 -1.463178 -0.879613 -1.7 1.8 -0.8 -3.826667 -5.984 -7.820 1.6 103 -1.463178 -0.867002 2.0 0.3 -2.3 -5.433333 -7.714 -9.616 1.0
Submitted September 15, 2019 at 04:59AM by DanTup