The Great
The Great

Reputation: 7733

How to fetch preceding ids on fly using pandas

I have a data frame like as shown below

df = pd.DataFrame({'subject_id':[11,11,11,12,12,12],
                   'test_date':['02/03/2012 10:24:21','05/01/2019 10:41:21','12/13/2011 11:14:21','10/11/1992 11:14:21','02/23/2002 10:24:21','07/19/2005 10:24:21'],
                   'original_enc':['A742','B963','C354','D563','J323','G578']})

hash_file = pd.DataFrame({'source_enc':['A742','B963','C354','D563','J323','G578'],
                          'hash_id':[1,2,3,4,5,6]})

cols = ["subject_id","test_date","enc_id","previous_enc_id"]
test_df = pd.DataFrame(columns=cols)
test_df.head()

I would like to do two things here

  1. Map original_enc to their corresponding hash_id and store it in enc_id

  2. Find the previous hash_id for each subject based on their current hash_id and store it in previous_enc_id

I tried the below

test_df['subject_id'] = df['subject_id']
test_df['test_date'] = df['test_date']
test_df['enc_id'] = df['original_enc'].map(hash_file)
test_df = test_df.sort_values(['subject_id','test_date'],ascending=True)
test_df['previous_enc_id'] = test_df.groupby(['subject_id','test_date'])['enc_id'].shift(1)

However, I don't get the expected output for the previous_enc_id column as it is all NA.

I expect my output to be like as shown below. You see NA in the expected row for the 1st row of every subject because that's their 1st encounter. There is no info to look back to.

enter image description here

Upvotes: 1

Views: 21

Answers (1)

jezrael
jezrael

Reputation: 863166

Use only one column for groupby:

test_df['previous_enc_id'] = test_df.groupby('subject_id')['enc_id'].shift()

Upvotes: 1

Related Questions