Fetch previous rows based on if condition and Shift function - Python dataframe

Question

I have data as shown below. I would like to select rows based on two conditions.

1) rows that start with digits (1,2,3 etc)

2) previous row of the records that satisfy 1st condition

Please find the how the input data looks like

Please find how I expect the output to be

I tried using the shift(-1) function but it seems to be throwing error. I am sure I messed up with the logic/syntax. Please find the code below that I tried

 # i get the index of all records that start with number.    
 s=df1.loc[df1['VARIABLE'].str.contains('^\d')==True].index 
 # now I need to get the previous record of each group but this is 
                                                         #incorrect
 df1.loc[((df1['VARIABLE'].shift(-1).str.contains('^\d')==False) & 
 (df1['VARIABLE'].str.contains('^\d')==True))].index

jezrael · Accepted Answer

Use:

df1 = pd.DataFrame({'VARIABLE':['studyid',np.nan,'age_interview','Gender','1.Male',
                                '2.Female',np.nan, 'dob', 'eth',
                                'Ethnicity','1.Chinese','2.Indian','3.Malay']})

#first remove missing rows by column VARIABLE
df1 = df1.dropna(subset=['VARIABLE'])
#test startinf numbers
s = (df1['VARIABLE'].str.contains('^\d'))
#chain shifted values by | for OR
mask  = s | s.shift(-1)
#filtering by boolean indexing
df1 = df1[mask]
print (df1)
     VARIABLE
3      Gender
4      1.Male
5    2.Female
9   Ethnicity
10  1.Chinese
11   2.Indian
12    3.Malay

Fetch previous rows based on if condition and Shift function - Python dataframe

Answers (1)

Related Questions