Reputation: 7733
I have a data frame like as shown below
df = pd.DataFrame({'subject_id':[11,11,11,12,12,12],
'test_date':['02/03/2012 10:24:21','05/01/2019 10:41:21','12/13/2011 11:14:21','10/11/1992 11:14:21','02/23/2002 10:24:21','07/19/2005 10:24:21'],
'original_enc':['A742','B963','C354','D563','J323','G578']})
hash_file = pd.DataFrame({'source_enc':['A742','B963','C354','D563','J323','G578'],
'hash_id':[1,2,3,4,5,6]})
cols = ["subject_id","test_date","enc_id","previous_enc_id"]
test_df = pd.DataFrame(columns=cols)
test_df.head()
I would like to do two things here
Map original_enc
to their corresponding hash_id
and store it in enc_id
Find the previous hash_id
for each subject based on their current hash_id
and store it in previous_enc_id
I tried the below
test_df['subject_id'] = df['subject_id']
test_df['test_date'] = df['test_date']
test_df['enc_id'] = df['original_enc'].map(hash_file)
test_df = test_df.sort_values(['subject_id','test_date'],ascending=True)
test_df['previous_enc_id'] = test_df.groupby(['subject_id','test_date'])['enc_id'].shift(1)
However, I don't get the expected output for the previous_enc_id
column as it is all NA
.
I expect my output to be like as shown below. You see NA
in the expected row for the 1st row of every subject because that's their 1st encounter. There is no info to look back to.
Upvotes: 1
Views: 21
Reputation: 863166
Use only one column for groupby
:
test_df['previous_enc_id'] = test_df.groupby('subject_id')['enc_id'].shift()
Upvotes: 1