How to fetch preceding ids on fly using pandas

Question

I have a data frame like as shown below

df = pd.DataFrame({'subject_id':[11,11,11,12,12,12],
                   'test_date':['02/03/2012 10:24:21','05/01/2019 10:41:21','12/13/2011 11:14:21','10/11/1992 11:14:21','02/23/2002 10:24:21','07/19/2005 10:24:21'],
                   'original_enc':['A742','B963','C354','D563','J323','G578']})

hash_file = pd.DataFrame({'source_enc':['A742','B963','C354','D563','J323','G578'],
                          'hash_id':[1,2,3,4,5,6]})

cols = ["subject_id","test_date","enc_id","previous_enc_id"]
test_df = pd.DataFrame(columns=cols)
test_df.head()

I would like to do two things here

Map original_enc to their corresponding hash_id and store it in enc_id
Find the previous hash_id for each subject based on their current hash_id and store it in previous_enc_id

I tried the below

test_df['subject_id'] = df['subject_id']
test_df['test_date'] = df['test_date']
test_df['enc_id'] = df['original_enc'].map(hash_file)
test_df = test_df.sort_values(['subject_id','test_date'],ascending=True)
test_df['previous_enc_id'] = test_df.groupby(['subject_id','test_date'])['enc_id'].shift(1)

However, I don't get the expected output for the previous_enc_id column as it is all NA.

I expect my output to be like as shown below. You see NA in the expected row for the 1st row of every subject because that's their 1st encounter. There is no info to look back to.

jezrael · Accepted Answer

Use only one column for groupby:

test_df['previous_enc_id'] = test_df.groupby('subject_id')['enc_id'].shift()

How to fetch preceding ids on fly using pandas

Answers (1)

Related Questions