The Great
The Great

Reputation: 7713

Fetch previous rows based on if condition and Shift function - Python dataframe

I have data as shown below. I would like to select rows based on two conditions.

1) rows that start with digits (1,2,3 etc)

2) previous row of the records that satisfy 1st condition

Please find the how the input data looks like

enter image description here

Please find how I expect the output to be

enter image description here

I tried using the shift(-1) function but it seems to be throwing error. I am sure I messed up with the logic/syntax. Please find the code below that I tried

 # i get the index of all records that start with number.    
 s=df1.loc[df1['VARIABLE'].str.contains('^\d')==True].index 
 # now I need to get the previous record of each group but this is 
                                                         #incorrect
 df1.loc[((df1['VARIABLE'].shift(-1).str.contains('^\d')==False) & 
 (df1['VARIABLE'].str.contains('^\d')==True))].index

Upvotes: 0

Views: 159

Answers (1)

jezrael
jezrael

Reputation: 862701

Use:

df1 = pd.DataFrame({'VARIABLE':['studyid',np.nan,'age_interview','Gender','1.Male',
                                '2.Female',np.nan, 'dob', 'eth',
                                'Ethnicity','1.Chinese','2.Indian','3.Malay']})

#first remove missing rows by column VARIABLE
df1 = df1.dropna(subset=['VARIABLE'])
#test startinf numbers
s = (df1['VARIABLE'].str.contains('^\d'))
#chain shifted values by | for OR
mask  = s | s.shift(-1)
#filtering by boolean indexing
df1 = df1[mask]
print (df1)
     VARIABLE
3      Gender
4      1.Male
5    2.Female
9   Ethnicity
10  1.Chinese
11   2.Indian
12    3.Malay

Upvotes: 1

Related Questions