Iterate through multiple rows of dataframe and dropping rows based on condition

Question

i have a dataframe:

  column1
19:08:22
ABCD
19:08:40
WXYZ
AAAA
19:09:02
XXXX
ZZZZ
19:09:49
ABCD

I want to keep only those rows which has text value in two consecutive rows after a row containing time(dtype of this is also string).

I'm looking for this output:

  column1
19:08:40
WXYZ
AAAA
19:09:02
XXXX
ZZZZ

Or in a better way:

column1   text1  text2
19:08:40  WXYZ   AAAA
19:09:02  XXXX   ZZZZ

I'm not sure how to approach this problem,

I thought of using .shift(2) to compare the rows but it isn't working. Also thought of running a iterative loop such as:

for index,rows in df.iterrows():
  current_row = rows
  ###Check for alternate row, if this contains time value remove them.

But this isn't a right way of attempting this problem. Any help or directions is appreciated.

Scott Boston · Accepted Answer

Try:

grp = df['column1'].str.match('\d{2}:\d{2}:\d{2}').cumsum()
m = df.groupby(grp)['column1'].transform('count') > 2
df.loc[m]

Output:

    column1
2  19:08:40
3      WXYZ
4      AAAA
5  19:09:02
6      XXXX
7      ZZZZ

Details:

First create group by using regex to match pattern for "time", then cumsum to group or block records together.
Next use groupby with transform to count the number of rows in each group
Lastly, filter dataframe using boolean indexing based on the number of records in each gorup.

Update going further:

df['grp'] = df['column1'].str.match('\d{2}:\d{2}:\d{2}').cumsum()
m = df.groupby('grp')['column1'].transform('count') > 2
df_out = df.loc[m].copy()
df_out['time'] = df_out['column1'].str.extract('(\d{2}:\d{2}:\d{2})').ffill()
df_out = df_out.query('column1 != time')
df_out.set_index(['time', df_out.groupby('time').cumcount()+1])['column1'].unstack().add_prefix('text')

Output:

         text1 text2
time                
19:08:40  WXYZ  AAAA
19:09:02  XXXX  ZZZZ

Iterate through multiple rows of dataframe and dropping rows based on condition

Answers (2)

Update going further:

Related Questions