Reputation: 5304
I have an OHLC dataframe (Open, High, Low, Close) for sensor data on a per minute basis. I need to scale the values but all with the same scale. The scale needs to use the minimum and maximum of any of the four columns. For example, the minimum could be in column 'Low' and the maximum could be in the column 'High'. Based on that range (min(df['low'])
- max(df['high'])
), I want to fit the scaler.
I am currently using the MinMaxScaler
from sklearn.preprocessing. However, I can only fit it to one column. So if I fit it to column df['open']
and transform another column, the values are no longer between 0 and 1 but can be < 0 and > 1.
How can I use the full range of all columns in the scaler?
Upvotes: 0
Views: 3449
Reputation: 5304
If anybody ends up on this page, I actually found another way of doing this, which involves reshaping the data using Numpy and feeding that into the scaler. Reshaping back and creating a new dataframe from that sorted my issue:
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
#kudo's to Nick, I used his df to illustrate my example.
df = pd.DataFrame({
'Open': [1, 1.1, 0.9, 0.9],
'High': [1.2, 1.2, 1.1, 1.3],
'Low': [1, 1.0, 0.8, 0.7],
'Close': [1.1, 1.2, 0.8, 1.2]
})
scaler = MinMaxScaler()
df_np = scaler.fit_transform(df.to_numpy().reshape(-1,1))
df = pd.DataFrame(df_np.reshape(4,-1), columns=df.columns)
# Open High Low Close
# 0 0.500000 0.833333 0.500000 0.666667
# 1 0.666667 0.833333 0.500000 0.833333
# 2 0.333333 0.666667 0.166667 0.166667
# 3 0.333333 1.000000 0.000000 0.833333
Upvotes: 4
Reputation: 147146
You could normalise all columns by doing the math yourself, using df.min().min()
and df.max().max()
to get the minimum and maximum values over the entire dataframe, or more simply df['Low'].min()
and df['High'].max()
to get the minimum/maximum values from the Low
and High
column respectively. For example:
df = pd.DataFrame({
'Open': [1, 1.1, 0.9, 0.9],
'High': [1.2, 1.2, 1.1, 1.3],
'Low': [1, 1.0, 0.8, 0.7],
'Close': [1.1, 1.2, 0.8, 1.2]
})
df
# Open High Low Close
# 0 1.0 1.2 1.0 1.1
# 1 1.1 1.2 1.0 1.2
# 2 0.9 1.1 0.8 0.8
# 3 0.9 1.3 0.7 1.2
min = df.min().min() # df['Low'].min()
max = df.max().max() # df['High'].max()
norm = (df - min) / (max - min)
norm
# Open High Low Close
# 0 0.500000 0.833333 0.500000 0.666667
# 1 0.666667 0.833333 0.500000 0.833333
# 2 0.333333 0.666667 0.166667 0.166667
# 3 0.333333 1.000000 0.000000 0.833333
Upvotes: 1