Reputation: 345
Goal
If sub-column min
equals to sub-column max
and if min
and max
sub-column do not equal to each other in any of the column (ao, his, cyp1a2s, cyp3a4s in this case), drop the row.
Example
arrays = [np.array(['ao', 'ao', 'hia', 'hia', 'cyp1a2s', 'cyp1a2s', 'cyp3a4s', 'cyp3a4s']),
np.array(['min', 'max', 'min', 'max', 'min', 'max', 'min', 'max'])]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['',''])
df = pd.DataFrame(np.array([[1, 1, 0, 0, float('nan'), float('nan'), 0, 0],
[1, 1, 0, 0, float('nan'), 1, 0, 0],
[0, 2, 0, 0, float('nan'), float('nan'), 1, 1],]), index=['1', '2', '3'], columns=index)
df
ao hia cyp1a2s cyp3a4s
min max min max min max min max
1 1.0 1.0 0.0 0.0 NaN NaN 0.0 0.0
2 1.0 1.0 0.0 0.0 NaN 1.0 0.0 0.0
3 0.0 2.0 0.0 0.0 NaN NaN 1.0 1.0
Want
df = pd.DataFrame(np.array([[1, 1, 0, 0, float('nan'), float('nan'), 0, 0]]), index=['1'], columns=index)
df
ao hia cyp1a2s cyp3a4s
min max min max min max min max
1 1.0 1.0 0.0 0.0 NaN NaN 0.0 0.0
Attempt
df.apply(lambda x: x['min'].map(str) == x['max'].map(str), axis=1)
KeyError: ('min', 'occurred at index 1')
Note
The actual dataframe has 50+ columns.
Upvotes: 3
Views: 167
Reputation: 1247
The reason df.apply() didn't work is you needed to reference 2 levels of columns.
Also .map(str) was invalid for mapping from float64... used .astype(str)
The following work for >1 columns:
eqCols = ['cyp1a2s','hia']
neqCols = list(set(df.xs('min', level=1, axis=1).columns) - set(eqCols))
EQ = lambda r,c : r[c]['min'].astype(str) == r[c]['max'].astype(str)
df[df.apply(lambda r: ([EQ(r,c) for c in eqCols][0]) & ([(not EQ(r,c)) for c in neqCols][0]), axis=1)]
Upvotes: 1
Reputation: 862591
Use DataFrame.xs
for DataFrame
by second levels of MultiIndex
, replace NaN
s:
df1 = df.xs('min', axis=1, level=1).fillna('nan')
df2 = df.xs('max', axis=1, level=1).fillna('nan')
Or convert data to strings:
df1 = df.xs('min', axis=1, level=1).astype('str')
df2 = df.xs('max', axis=1, level=1).astype('str')
Compare Dataframes by DataFrame.eq
and test if all True
s by DataFrame.all
and last filter by boolean indexing
:
df = df[df1.eq(df2).all(axis=1)]
print (df)
ao hia cyp1a2s cyp3a4s
min max min max min max min max
1 1.0 1.0 0.0 0.0 NaN NaN 0.0 0.0
Upvotes: 2