homeboykeroro
homeboykeroro

Reputation: 35

Element wise numeric comparison in Pandas dataframe column value with list

I have 3 pandas multiindex column dataframes

dataframe 1(minimum value):

  |  A    |   B   |  C   |
  |  Min  |  Min  |  Min |
  |-------|-------|------|
0 | 26.47 | 17.31 | 1.26 |
1 | 27.23 | 14.38 | 1.36 |
2 | 27.23 | 18.88 | 1.28 |

dataframe 2 (value used to compare with)

row 0, row 1 and row 2 are the same, I extend the dataframe to three row for comparison with min and max dataframe. Value in each dataframe cell is ndarray

  |          A          |           B           |          C         |
  |         Val         |          Val          |         Val        |
  |---------------------|-----------------------|--------------------|
0 | [27.58,28.37,28.73] | [17.31, 18.42, 18.72] | [1.36, 1.28, 1.27] |
1 | [27.58,28.37,28.73] | [17.31, 18.42, 18.72] | [1.36, 1.28, 1.27] |
2 | [27.58,28.37,28.73] | [17.31, 18.42, 18.72] | [1.36, 1.28, 1.27] |

dataframe 3(maximum value):

  |  A    |   B   |  C   |
  |  Max  |  Max  |  Max |
  |-------|-------|------|
0 | 28.68 | 18.42 | 1.37 |
1 | 29.50 | 17.31 | 1.47 |
2 | 29.87 | 20.45 | 1.39 |

Expected result:

  |          A          |           B           |          C           |
  |        Result       |          Result       |         Result       |
  |---------------------|-----------------------|----------------------|
0 | [True, True, False] |  [True, True, False]  | [True, True, True]   |
1 | [True, True, True]  | [True, False, False]  | [True, False, False] |
2 | [True, True, True]  | [False, False, False] | [True, True, False]  |

I'd like to perform element wise comparison in this way:

min <= each element in ndarray <= max

i.e

for row 0:

26.47 <= [27.58,28.37,28.73] <= 28.68

17.31 <= [17.31, 18.42, 18.72] <= 18.42

1.26 <= [1.36, 1.28, 1.27] <= 1.37

and so on

I tried ( datafram2 >= dataframe3 ) & ( datafram2 <= datafram3 ) but not work.

What's the simplest way and fastest way to compute the result?

Example dataframe code:

min_columns = pd.MultiIndex.from_product( [ [ 'A', 'B', 'C' ], [ 'Min' ] ] )
val_columns = pd.MultiIndex.from_product( [ [ 'A', 'B', 'C' ], [ 'Val' ] ] )
max_columns = pd.MultiIndex.from_product( [ [ 'A', 'B', 'C' ], [ 'Max' ] ] )

min_df = pd.DataFrame( [ [ 26.47, 17.31, 1.26 ], [ 27.23, 14.38, 1.36 ], [ 27.23, 18.88, 1.28 ] ], columns=min_columns )
val_df = pd.DataFrame( [ [ [ 27.58, 28.37, 28.73 ], [ 17.31, 18.42, 18.72], [1.36, 1.28, 1.27 ] ] ] , columns=val_columns )
max_df = pd.DataFrame( [ [ 28.68, 18.42, 1.37 ], [ 29.50, 17.31, 1.47 ], [ 29.87, 20.45, 1.39 ] ] , columns=max_columns )

Upvotes: 3

Views: 727

Answers (2)

Pygirl
Pygirl

Reputation: 13349

Just turn the column values into NumPy arrays. and simply treat it as an array comparing problem (row wise).

You can use apply:

def bool_check(row):
    col = row.name[0]
    min_val = df1[pd.IndexSlice[col]].to_numpy()
    max_val = df3[pd.IndexSlice[col]].to_numpy()
    x = np.array(row.tolist())
    return list((x >= min_val) & (x <= max_val))

res = df2.apply(bool_check,axis=0).rename(columns={'Val':'Result'})

res:

A B C
Result Result Result
0 [True, True, False] [True, True, False] [True, True, True]
1 [True, True, True] [True, False, False] [True, False, False]
2 [True, True, True] [False, False, False] [True, True, False]

Update

(Complete Solution Based on the data you've provided):

def bool_check(row):
    col = row.name[0]
    min_val = min_df[pd.IndexSlice[col]].to_numpy()
    max_val = max_df[pd.IndexSlice[col]].to_numpy()
    x = np.array(row.tolist())
    return list((x >= min_val) & (x <= max_val))

res = val_df.apply(bool_check,axis=0).rename(columns={'Val':'Result'})

Time Comparison:

Method 1 (Nk03's method1):

CPU times: user 19.5 ms, sys: 0 ns, total: 19.5 ms Wall time: 18.9 ms

Method 2 (Nk03's method2):

CPU times: user 23 ms, sys: 102 µs, total: 23.1 ms Wall time: 21.9 ms

Method 3 (Using numpy based comparison):

CPU times: user 8.76 ms, sys: 26 µs, total: 8.79 ms Wall time: 8.91 ms

Nk03's Updated and Optimized Solution:

CPU times: user 16 ms, sys: 0 ns, total: 16 ms Wall time: 15.5 ms

Upvotes: 3

Nk03
Nk03

Reputation: 14949

If you've dataframe's like these:

df1 = pd.DataFrame({'AMin': {0: 26.47, 1: 27.23, 2: 27.23},
 'BMin': {0: 17.31, 1: 14.38, 2: 18.88},
 'CMin': {0: 1.26, 1: 1.36, 2: 1.28}})


df2 = pd.DataFrame({'AVal': {0: [27.58, 28.37, 28.73],
  1: [27.58, 28.37, 28.73],
  2: [27.58, 28.37, 28.73]},
 'BVal': {0: [17.31, 18.42, 18.72],
  1: [17.31, 18.42, 18.72],
  2: [17.31, 18.42, 18.72]},
 'CVal': {0: [1.36, 1.28, 1.27], 1: [1.36, 1.28, 1.27], 2: [1.36, 1.28, 1.27]}})

df3 = pd.DataFrame({'AMax': {0: 28.68, 1: 29.5, 2: 29.87},
 'BMax': {0: 18.42, 1: 17.31, 2: 20.45},
 'CMax': {0: 1.37, 1: 1.47, 2: 1.39}})

Then you can explode the 2nd dataframe and compare the values.

m = df2.apply(pd.Series.explode).values
df = pd.DataFrame(
    (df1.iloc[np.arange(len(df1)).repeat(3)].values <= m) &
    (m <= df3.iloc[np.arange(len(df3)).repeat(3)].values),
    columns=df2.columns
)

df = df.groupby(df.index // 3).agg(list)

OUTPUT:

                     A                      B                     C
0  [True, True, False]    [True, True, False]    [True, True, True]
1   [True, True, True]   [True, False, False]  [True, False, False]
2   [True, True, True]  [False, False, False]   [True, True, False]

Upvotes: 3

Related Questions