Reputation: 95
I have a dataframe that has a column 'rel_max' that has a list of all the values of local maxima (if relevant or more useful I also have a column of the indices of these local extrema). I would like to take this list of values or indices and mask the dataframe so that I have a maxima in its correct spot and NaN or 0 for all other values of the dataframe.
df = pd.DataFrame({'123': [20.908, 8.743, 8.34, 2.4909],
'124': [2, 2.34, 0, 4.1234],
'412': [2, 20.123, 3.123123, 0],
'516': [5, 20.120, 4.12, 0],
'129': [6, 20.10, 3.123123, 0],
'rel_max': [[20.908, 6], [8.743,20.123], [8.34,4.12], [4.1234]]},
index=['2015-01-10', '2015-02-10', '2015-03-10', '2015-04-10'])
This is the dataframe with the relative max values. ^
This is the expected dataframe.
df1 = pd.DataFrame({'123': [20.908, 8.743, 8.34, 0],
'124': [0, 0, 0, 4.1234],
'412': [0, 20.123, 0, 0],
'516': [0, 0, 4.12, 0],
'129': [6, 0, 0, 0],
'rel_max': [[20.908, 6], [8.743,20.123], [8.34,4.12], [4.1234]]},
index=['2015-01-10', '2015-02-10', '2015-03-10', '2015-04-10'])
Essentially, I am trying to retrieve or pull the dataframe with only the local extrema.
123 124 412 516 129 rel_max
2015-01-10 20.908 0.0000 0.000 0.00 6 [20.908, 6]
2015-02-10 8.743 0.0000 20.123 0.00 0 [8.743, 20.123]
2015-03-10 8.340 0.0000 0.000 4.12 0 [8.34, 4.12]
2015-04-10 0.000 4.1234 0.000 0.00 0 [4.1234]
Upvotes: 0
Views: 1926
Reputation: 59264
Use indexing. Firt get the min values and max values, and use numpy and pandas broadcasting
to create two masks m1
and m2
. Then, broadcast again
smax = df.rel_max.str[0]
smin = df.rel_max.str[1]
m1 = df == np.broadcast_to(smax.values.reshape(-1,1), df.shape)
m2 = df == np.broadcast_to(smin.values.reshape(-1,1), df.shape)
df[m1 | m2]
In more detail, smax
is a series with the max values and smin
is a series of the min
values. m1
is a data frame of True
/False
values. It yields True
whenever a cell in df
is equal to one of the values in the broadcasted values. I'd recommend to run each part of the code alone and see the output, it's more intuitive ;)
Output is:
123 124 412 516 129 rel_max
2015-01-10 20.908 0.0000 0.000 0.00 6 [20.908, 6]
2015-02-10 8.743 0.0000 20.123 0.00 0 [8.743, 20.123]
2015-03-10 8.340 0.0000 0.000 4.12 0 [8.34, 4.12]
2015-04-10 0.000 4.1234 0.000 0.00 0 [4.1234]
Upvotes: 1
Reputation: 153460
You could try something like this:
pd.concat([df.iloc[:, :-1].where(df.apply(lambda x: x[:-1].isin(x.iloc[-1]), axis=1), 0),
df.iloc[:, -1]], axis=1)
Output:
123 124 412 516 129 rel_max
2015-01-10 20.908 0.0000 0.000 0.00 6.0 [20.908, 6]
2015-02-10 8.743 0.0000 20.123 0.00 0.0 [8.743, 20.123]
2015-03-10 8.340 0.0000 0.000 4.12 0.0 [8.34, 4.12]
2015-04-10 0.000 4.1234 0.000 0.00 0.0 [4.1234]
Upvotes: 1