rye_bread
rye_bread

Reputation: 95

How to mask a dataframe given a list of values or indices in the dataframe

I have a dataframe that has a column 'rel_max' that has a list of all the values of local maxima (if relevant or more useful I also have a column of the indices of these local extrema). I would like to take this list of values or indices and mask the dataframe so that I have a maxima in its correct spot and NaN or 0 for all other values of the dataframe.

df = pd.DataFrame({'123': [20.908, 8.743, 8.34, 2.4909],
                 '124': [2, 2.34, 0, 4.1234],
                  '412': [2, 20.123, 3.123123, 0],
                   '516': [5, 20.120, 4.12, 0],
                   '129': [6, 20.10, 3.123123, 0],
                    'rel_max': [[20.908, 6], [8.743,20.123], [8.34,4.12], [4.1234]]},

                 index=['2015-01-10', '2015-02-10', '2015-03-10', '2015-04-10'])

This is the dataframe with the relative max values. ^

This is the expected dataframe.

df1 = pd.DataFrame({'123': [20.908, 8.743, 8.34, 0],
                 '124': [0, 0, 0, 4.1234],
                  '412': [0, 20.123, 0, 0],
                   '516': [0, 0, 4.12, 0],
                   '129': [6, 0, 0, 0],
                    'rel_max': [[20.908, 6], [8.743,20.123], [8.34,4.12], [4.1234]]},

                 index=['2015-01-10', '2015-02-10', '2015-03-10', '2015-04-10'])

Essentially, I am trying to retrieve or pull the dataframe with only the local extrema.

               123     124     412   516  129          rel_max
2015-01-10  20.908  0.0000   0.000  0.00    6      [20.908, 6]
2015-02-10   8.743  0.0000  20.123  0.00    0  [8.743, 20.123]
2015-03-10   8.340  0.0000   0.000  4.12    0     [8.34, 4.12]
2015-04-10   0.000  4.1234   0.000  0.00    0         [4.1234]

Upvotes: 0

Views: 1926

Answers (2)

rafaelc
rafaelc

Reputation: 59264

Use indexing. Firt get the min values and max values, and use numpy and pandas broadcasting to create two masks m1 and m2. Then, broadcast again

smax = df.rel_max.str[0]
smin = df.rel_max.str[1]

m1 = df == np.broadcast_to(smax.values.reshape(-1,1), df.shape)

m2 = df == np.broadcast_to(smin.values.reshape(-1,1), df.shape)

df[m1 | m2]

In more detail, smax is a series with the max values and smin is a series of the min values. m1 is a data frame of True/False values. It yields True whenever a cell in df is equal to one of the values in the broadcasted values. I'd recommend to run each part of the code alone and see the output, it's more intuitive ;)

Output is:

            123     124     412     516     129 rel_max
2015-01-10  20.908  0.0000  0.000   0.00    6   [20.908, 6]
2015-02-10  8.743   0.0000  20.123  0.00    0   [8.743, 20.123]
2015-03-10  8.340   0.0000  0.000   4.12    0   [8.34, 4.12]
2015-04-10  0.000   4.1234  0.000   0.00    0   [4.1234]

Upvotes: 1

Scott Boston
Scott Boston

Reputation: 153460

You could try something like this:

pd.concat([df.iloc[:, :-1].where(df.apply(lambda x: x[:-1].isin(x.iloc[-1]), axis=1), 0), 
           df.iloc[:, -1]], axis=1)

Output:

               123     124     412   516  129          rel_max
2015-01-10  20.908  0.0000   0.000  0.00  6.0      [20.908, 6]
2015-02-10   8.743  0.0000  20.123  0.00  0.0  [8.743, 20.123]
2015-03-10   8.340  0.0000   0.000  4.12  0.0     [8.34, 4.12]
2015-04-10   0.000  4.1234   0.000  0.00  0.0         [4.1234]

Upvotes: 1

Related Questions