Reputation: 2219
I have a df that stores medical records and I need to identify the first site that a person goes to after their discharge date. The df is grouped by ID.
There are 3 options: (1) within a group, if any of the rows have a begin_date that matches the first rows end_date, return that location as the first site (if there are two rows that meet this condition, either are correct). (2) if the first option does not exist, then select the first location after the initial location (3) else, if conditions 1 and 2 do not exist, then return 'Home'
ID color begin_date end_date location
1 red 2017-01-01 2017-01-07 initial
1 green 2017-01-05 2017-01-07 nursing
1 blue 2017-01-07 2017-01-15 rehab
1 red 2017-01-11 2017-01-22 Health
2 red 2017-02-22 2017-02-26 initial
2 green 2017-02-26 2017-02-28 nursing
2 blue 2017-02-26 2017-02-28 rehab
3 red 2017-03-11 2017-03-22 initial
4 red 2017-04-01 2017-04-07 initial
4 green 2017-04-05 2017-04-07 nursing
4 blue 2017-04-10 2017-04-15 Health
Expected Result:
ID first_site
1 rehab
2 nursing
3 home
4 nursing
My attempt is below. I get an error of "None of [Int64Index([8], dtype='int64')] are in the [index]"
with not much online help about the error.
If I remove the elif
condition regarding val2, then I do not run into an error.
def First(x):
#compare each group first and see if there are any locations that match
val = x.loc[x['begin_date'] == x['end_date'].iloc[0], 'location']
#find the first location after the initial stay
val2 = x.loc[x[x.location=='initial'].index+1, 'location']
if not val.empty:
return val.iloc[0]
elif not val2.empty:
return val2.iloc[0]
else:
return 'Home'
final = df.groupby('ID').apply(First).reset_index(name='first_site')
print (final)
What am I doing wrong?
Upvotes: 1
Views: 81
Reputation: 23753
'ID' == 3
only has one row - the val2
expression is trying to index a position that isn't there.
Check to see if a group has only one row first.
def First(x):
if len(x) == 1:
return_value = 'Home'
else:
val = x.loc[x['begin_date'] == x['end_date'].iloc[0], 'location']
val2 = x.loc[x[x.location=='initial'].index+1, 'location']
if not val.empty:
return_value = val.iloc[0]
elif not val2.empty:
return_value = val2.iloc[0]
return return_value
gb = df.groupby('ID')
>>> gb.apply(First)
ID
1 rehab
2 nursing
3 Home
4 nursing
dtype: object
>>>
Upvotes: 2