rwolst
rwolst

Reputation: 13662

Pandas merge on subgroups of dataframe

Consider the dataframe df:

   SCHOOL  ID  SET  NAME
0  Foo     1   0    Ben
1  Foo     2   0    Ben
2  Foo     3   1    Chris
3  Foo     4   1    Joe
4  Foo     5   0    Tom
5  Bar     1   0    Harry
6  Bar     2   1    Jeff
7  Bar     3   0    George
8  Bar     4   1    Tom

I want to create the new dataframe

   SCHOOL  ID  SET  NAME    PREV_NAME
0  Foo     1   0    Ben     None
1  Foo     2   0    Ben     Ben
2  Foo     3   1    Chris   None
3  Foo     4   1    Joe     Chris
4  Foo     5   0    Tom     Ben
5  Bar     1   0    Harry   None
6  Bar     2   1    Jeff    None
7  Bar     3   0    George  Harry
8  Bar     4   1    Tom     Jeff

where PREV_NAME is the previous NAME (sorted by ID) in the same SCHOOL and SET as a given record.


If there were no SET condition, then I could do something like

df_copy = df.copy()
df_copy['ID'] = df_copy['ID'] + 1
df.merge(df_copy, on=['SCHOOL', 'ID'], how='left')

As there is a SET condition, we need a way of grouping each unique (SCHOOL, SET) pair and then applying the above merge (noting that we can no longer use the ID) with the new index.

How would I achieve this using Pandas?

Upvotes: 0

Views: 174

Answers (1)

cs95
cs95

Reputation: 402333

Seems like you want groupby + shift on NAME -

df['PREV_NAME'] = df.groupby(['SCHOOL' ,'SET']).NAME.shift()
df

  SCHOOL  ID  SET    NAME PREV_NAME
0    Foo   1    0     Ben       NaN
1    Foo   2    0     Ben       Ben
2    Foo   3    1   Chris       NaN
3    Foo   4    1     Joe     Chris
4    Foo   5    0     Tom       Ben
5    Bar   1    0   Harry       NaN
6    Bar   2    1    Jeff       NaN
7    Bar   3    0  George     Harry
8    Bar   4    1     Tom      Jeff

Upvotes: 1

Related Questions