d_kennetz
d_kennetz

Reputation: 5359

Python Pandas-retrieving values in one column while they are less than the value of a second column

Suppose I have a df that looks like this:

   posF  ffreq  posR  rfreq
0    10   0.50  11.0   0.08
1    20   0.20  31.0   0.90
2    30   0.03  41.0   0.70
3    40   0.72  51.0   0.08
4    50   0.09  81.0   0.78
5    60   0.09   NaN    NaN
6    70   0.01   NaN    NaN
7    80   0.09   NaN    NaN
8    90   0.08   NaN    NaN
9   100   0.02   NaN    NaN

In the posR column, we see that it jumps from 11 to 31, and there is not a value in the "20's". I want to insert a value to fill that space, which would essentially just be the posF value, and NA, so my resulting df would look like this:

   posF  ffreq  posR  rfreq
0    10   0.50  11.0   0.08
1    20   0.20  20     NaN
2    30   0.03  31.0   0.90
3    40   0.72  41.0   0.70
4    50   0.09  50     NaN
5    60   0.09  60     NaN
6    70   0.01  70     NaN
7    80   0.09  80     NaN
8    90   0.08  81.0   0.78
9   100   0.02  100    NaN

So I want to fill the NaN values in the position with the values from posF that are in between the values in posR.

What I have tried to do is just make a dummy list and add values to the list based on if they were less than a (I see the flaw here but I don't know how to fix it).

insert_rows = []
for x in df['posF']:
    for a,b in zip(df['posR'], df['rfreq']):
        if x<a:
            insert_rows.append([x, 'NA'])
print(len(insert_rows))#21, should be 5

I realize that it is appending x several times until it reaches the condition of being >a.

After this I will just create a new df and add these values to the original 2 columns so they are the same length.

If you can think of a better title, feel free to edit.

Upvotes: 1

Views: 98

Answers (1)

SpghttCd
SpghttCd

Reputation: 10860

My first thought was to retrieve the new indices for the entries in posR by interpolating with posF and then put the values to their new positions - but as you want to have 81 one row later than here, I'm afraid this is not exactly what you're searching for and I still don't really get the logic behind your task.
However, perhaps this is a starting point, let's see...

This approach would work like the following:

Retrieve the new index positions of the values in posR according to their order in posF:

import numpy as np
idx = np.interp(df.posR, df.posF, df.index).round()

Get rid of nan entries and cast to int:

idx = idx[np.isfinite(idx)].astype(int)

Create a new column by copying posF in the first step, and set newrfreq to nan respectively:

df['newposR'] = df.posF
df['newrfreq'] = np.nan

Then overwrite with the values from posR and rfreq, but now at the updated positions:

df.loc[idx, 'newposR'] = df.posR[:len(idx)].values
df.loc[idx, 'newrfreq'] = df.rfreq[:len(idx)].values

Result:

   posF  ffreq  posR  rfreq  newposR  newrfreq
0    10   0.50  11.0   0.08     11.0      0.08
1    20   0.20  31.0   0.90     20.0       NaN
2    30   0.03  41.0   0.70     31.0      0.90
3    40   0.72  51.0   0.08     41.0      0.70
4    50   0.09  81.0   0.78     51.0      0.08
5    60   0.09   NaN    NaN     60.0       NaN
6    70   0.01   NaN    NaN     70.0       NaN
7    80   0.09   NaN    NaN     81.0      0.78
8    90   0.08   NaN    NaN     90.0       NaN
9   100   0.02   NaN    NaN    100.0       NaN

Upvotes: 1

Related Questions