JarroVGIT
JarroVGIT

Reputation: 5304

MinMaxScaler with range from multiple columns in dataframe

I have an OHLC dataframe (Open, High, Low, Close) for sensor data on a per minute basis. I need to scale the values but all with the same scale. The scale needs to use the minimum and maximum of any of the four columns. For example, the minimum could be in column 'Low' and the maximum could be in the column 'High'. Based on that range (min(df['low']) - max(df['high'])), I want to fit the scaler.

I am currently using the MinMaxScaler from sklearn.preprocessing. However, I can only fit it to one column. So if I fit it to column df['open'] and transform another column, the values are no longer between 0 and 1 but can be < 0 and > 1.

How can I use the full range of all columns in the scaler?

Upvotes: 0

Views: 3449

Answers (2)

JarroVGIT
JarroVGIT

Reputation: 5304

If anybody ends up on this page, I actually found another way of doing this, which involves reshaping the data using Numpy and feeding that into the scaler. Reshaping back and creating a new dataframe from that sorted my issue:

import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler

#kudo's to Nick, I used his df to illustrate my example.
df = pd.DataFrame({
  'Open': [1, 1.1, 0.9, 0.9],
  'High': [1.2, 1.2, 1.1, 1.3],
  'Low': [1, 1.0, 0.8, 0.7],
  'Close': [1.1, 1.2, 0.8, 1.2] 
})

scaler = MinMaxScaler()
df_np = scaler.fit_transform(df.to_numpy().reshape(-1,1))
df = pd.DataFrame(df_np.reshape(4,-1), columns=df.columns)

#   Open    High    Low Close
# 0 0.500000    0.833333    0.500000    0.666667
# 1 0.666667    0.833333    0.500000    0.833333
# 2 0.333333    0.666667    0.166667    0.166667
# 3 0.333333    1.000000    0.000000    0.833333

Upvotes: 4

Nick
Nick

Reputation: 147146

You could normalise all columns by doing the math yourself, using df.min().min() and df.max().max() to get the minimum and maximum values over the entire dataframe, or more simply df['Low'].min() and df['High'].max() to get the minimum/maximum values from the Low and High column respectively. For example:

df = pd.DataFrame({
  'Open': [1, 1.1, 0.9, 0.9],
  'High': [1.2, 1.2, 1.1, 1.3],
  'Low': [1, 1.0, 0.8, 0.7],
  'Close': [1.1, 1.2, 0.8, 1.2] 
})
df
#    Open  High  Low  Close
# 0   1.0   1.2  1.0    1.1
# 1   1.1   1.2  1.0    1.2
# 2   0.9   1.1  0.8    0.8
# 3   0.9   1.3  0.7    1.2

min = df.min().min()    # df['Low'].min()
max = df.max().max()    # df['High'].max()
norm = (df - min) / (max - min)
norm
#        Open      High       Low     Close
# 0  0.500000  0.833333  0.500000  0.666667
# 1  0.666667  0.833333  0.500000  0.833333
# 2  0.333333  0.666667  0.166667  0.166667
# 3  0.333333  1.000000  0.000000  0.833333

Upvotes: 1

Related Questions