user15649753
user15649753

Reputation: 523

how deterring the index of a column under some conditiones?

how I can do the following in pandas. let's I have a column which has an index. I want to find the index in col1 and do the following:

It is hard to explain but I think it is more clearer in an example:

consider the following data:

ind_column     col1
4             0.5
4             0.65
4             0.6
4             0.2
4             0.1
4             0.8
4             -0.3
4             -0.2
4             -0.3

here, the index column is 4, so we look at 3 rows before 4 and 3 rows after 4. Basically rows: 1,2,3,4,5,6,7. We see that in rows 6th the sign changed, since all signs are the same before this point (positive) and after this point (negative) the index 6th is selected.

Now lets consider the following data

ind_column     col1
5             0.5
5             0.65
5             -0.6
5             0.2
5             -0.1
5             0.8
5             0.3
5             -0.2
5             -0.3

Now we should look at the rows 2,3,4,5,6,7,8. Since the sign is changing in these rows more than once, the index 5 is celected.

in the following data also index 5 is selected since the sign never changes.

ind_column     col1
5             0.5
5             0.65
5             0.6
5             0.2
5             0.1
5             0.8
5             0.3
5             0.2
5             0.3

Upvotes: 0

Views: 21

Answers (1)

ouroboros1
ouroboros1

Reputation: 14239

One approach could be as follows:

import pandas as pd
import numpy as np

def get_index(df):
    idx = df.loc[0,'ind_column']
    tmp = np.sign(df.iloc[max(idx-3,0):idx+4]['col1']).diff().dropna().ne(0)
    if tmp.sum() == 1:
        return tmp[tmp].index.item()
    return idx

indices = [get_index(df) for df in [df1,df2,df3]] # see `dfs` listed below

print(indices)
[6, 5, 5]

Explanation

  • Inside the function, we first retrieve the "start" index from column ind_column (e.g. 4 or 5 in the example dfs).
  • Next, we use df.iloc to select the index range -3 rows through to (and including) +3 rows from idx value. We use max(idx-3,0) to ensure that we won't be trying to select a negative index value (i.e. when idx < 3) as the starting point, causing an empty df selection.
  • From the df slice, we select column col1 and apply np.sign to get a series with -1, 0 or 1.
  • Now, we chain Series.diff to get the difference between consecutive rows, and use Series.dropna to get rid of the first row, which will be NaN by definition.
  • Finally, we chain Series.ne with 0. This will get us the series with booleans: True where there is a switch in sign, False where there is no switch.
  • The resulting pd.Series is stored as tmp and now, we simply have to check whether the sum of the series equals 1. If it does, we want to get the index of the True value responsible, returning tmp[tmp].index.item(). In all other cases, we either have no switches at all, or multiple ones. In both cases, we simply want to return idx.

Data used for df1, df2, df3 above

data1 = {'ind_column': {0: 4, 1: 4, 2: 4, 3: 4, 4: 4, 5: 4, 6: 4, 7: 4, 8: 4}, 
        'col1': {0: 0.5, 1: 0.65, 2: 0.6, 3: 0.2, 4: 0.1, 5: 0.8, 6: -0.3, 
                 7: -0.2, 8: -0.3}}
df1 = pd.DataFrame(data1)

data2 = {'ind_column': {0: 5, 1: 5, 2: 5, 3: 5, 4: 5, 5: 5, 6: 5, 7: 5, 8: 5}, 
         'col1': {0: 0.5, 1: 0.65, 2: -0.6, 3: 0.2, 4: -0.1, 5: 0.8, 6: 0.3, 
                  7: -0.2, 8: -0.3}}
df2 = pd.DataFrame(data2)

data3 = {'ind_column': {0: 5, 1: 5, 2: 5, 3: 5, 4: 5, 5: 5, 6: 5, 7: 5, 8: 5}, 
         'col1': {0: 0.5, 1: 0.65, 2: 0.6, 3: 0.2, 4: 0.1, 5: 0.8, 6: 0.3,
                  7: 0.2, 8: 0.3}}
df3 = pd.DataFrame(data3)

Upvotes: 1

Related Questions