NK09
NK09

Reputation: 195

Swap rows when conditions match for a column in a dataframe?

I have a dataframe which looks like below:

user_id  event_name   event_params
10           start          /pseudo
10          subcategory      /home
10        add_basket_click   /click
10       add_basket_error    /event
10            end            /end
11           start           /pseudo
11      add_basket_click    /click
11       add_basket_error   /event

I want to swap rows for event_name, add_basket_click & add_basket_error. Currently add_basket_error comes after add_basket_click, I want it come before it. Output should like below. The real dataset has 10 million rows, so looking for a pythonic answer.

user_id event_name  event_params
10           start        /pseudo
10        subcategory       /home
10       add_basket_error   /event
10       add_basket_click   /click
10            end           /end
11           start         /pseudo
11        add_basket_error  /event
11       add_basket_click   /click

Upvotes: 3

Views: 251

Answers (3)

NK09
NK09

Reputation: 195

I did the following and it worked

df['scounter'] = df.groupby('user_id').cumcount()+1

#
df1 = df[df.event_name == 'Add_Basket_Error']
df = df[df.event_name != 'Add_Basket_Error']

#
df1['scounter'] = df1['scounter'] - 1.1

#
df = df.append(df1, ignore_index=True)

#
df.sort_values(['user_id', 'scounter'], ascending = [True, True], inplace=True)
df = df.reset_index(drop=True)

Upvotes: 0

jezrael
jezrael

Reputation: 862611

Here is possible create list of all possible values in event_name by expected order, then convert column to ordered categoricals, so possible sorting by 2 columns with DataFrame.sort_values:

L = ['start','subcategory','add_basket_error','add_basket_click','end']
df['event_name'] = pd.Categorical(df['event_name'], ordered=True, categories=L)
df = df.sort_values(['user_id','event_name'])

print (df)
   user_id        event_name event_params
0       10             start      /pseudo
1       10       subcategory        /home
3       10  add_basket_error       /event
2       10  add_basket_click       /click
4       10               end         /end
5       11             start      /pseudo
7       11  add_basket_error       /event
6       11  add_basket_click       /click

EDIT:

#added separate row 1 - not changed in output
print (df)
   user_id        event_name event_params
0       10             start      /pseudo
1       10  add_basket_error       /event
2       10       subcategory        /home
3       10  add_basket_click       /click
4       10  add_basket_error       /event
5       10               end         /end
6       11             start      /pseudo
7       11  add_basket_click       /click
8       11  add_basket_error       /event

You can compare by Series.eq and also Series.shifted values, last assign back swapped rows:

m11 = df['event_name'].eq('add_basket_click')
m12 = df['event_name'].shift(-1).eq('add_basket_error')

m21 = df['event_name'].eq('add_basket_error')
m22 = df['event_name'].shift().eq('add_basket_click')


df[m21 & m22], df[m11 & m12] = df[m11 & m12].values, df[m21 & m22].values
print (df)
   user_id        event_name event_params
0       10             start      /pseudo
1       10  add_basket_error       /event
2       10       subcategory        /home
3       10  add_basket_error       /event
4       10  add_basket_click       /click
5       10               end         /end
6       11             start      /pseudo
7       11  add_basket_error       /event
8       11  add_basket_click       /click

Upvotes: 3

Chris Adams
Chris Adams

Reputation: 18647

Here is one potential solution, using boolean indexing and loc:

# Boolean series of event_name containing 'add_basket_error'
s = df.event_name.str.contains('add_basket_error')

# Create 2 frames, errors and events from boolean index 's'
errors, events = (df.loc[s[s].index].copy(), df.loc[s[s].index - 1].copy())

# Swap event and error values 
df.loc[s[s].index] = events.values
df.loc[s[s].index - 1] = errors.values
print(df)

[output]

   user_id        event_name event_params
0       10             start      /pseudo
1       10       subcategory        /home
2       10  add_basket_error       /event
3       10  add_basket_click       /click
4       10               end         /end
5       11             start      /pseudo
6       11  add_basket_error       /event
7       11  add_basket_click       /click

Upvotes: 3

Related Questions