CandleWax
CandleWax

Reputation: 2219

How to apply a function against a dataframe in pandas?

I have a df that stores medical records and I need to identify the first site that a person goes to after their discharge date. The df is grouped by ID.

There are 3 options: (1) within a group, if any of the rows have a begin_date that matches the first rows end_date, return that location as the first site (if there are two rows that meet this condition, either are correct). (2) if the first option does not exist, then select the first location after the initial location (3) else, if conditions 1 and 2 do not exist, then return 'Home'

ID    color  begin_date    end_date     location
1     red    2017-01-01    2017-01-07   initial
1     green  2017-01-05    2017-01-07   nursing
1     blue   2017-01-07    2017-01-15   rehab
1     red    2017-01-11    2017-01-22   Health
2     red    2017-02-22    2017-02-26   initial
2     green  2017-02-26    2017-02-28   nursing
2     blue   2017-02-26    2017-02-28   rehab
3     red    2017-03-11    2017-03-22   initial
4     red    2017-04-01    2017-04-07   initial
4     green  2017-04-05    2017-04-07   nursing
4     blue   2017-04-10    2017-04-15   Health

Expected Result:

ID   first_site
1    rehab
2    nursing
3    home
4    nursing

My attempt is below. I get an error of "None of [Int64Index([8], dtype='int64')] are in the [index]" with not much online help about the error. If I remove the elif condition regarding val2, then I do not run into an error.

def First(x):
   #compare each group first and see if there are any locations that match 
   val = x.loc[x['begin_date'] == x['end_date'].iloc[0], 'location']
   #find the first location after the initial stay
   val2 = x.loc[x[x.location=='initial'].index+1, 'location']
   if not val.empty:
       return val.iloc[0]
   elif not val2.empty:
       return val2.iloc[0]
   else:
       return 'Home'

final = df.groupby('ID').apply(First).reset_index(name='first_site')
print (final)

What am I doing wrong?

Upvotes: 1

Views: 81

Answers (1)

wwii
wwii

Reputation: 23753

'ID' == 3 only has one row - the val2 expression is trying to index a position that isn't there.

Check to see if a group has only one row first.

def First(x):
    if len(x) == 1:
        return_value = 'Home'
    else:
        val = x.loc[x['begin_date'] == x['end_date'].iloc[0], 'location']
        val2 = x.loc[x[x.location=='initial'].index+1, 'location']
        if not val.empty:
            return_value =  val.iloc[0]
        elif not val2.empty:
            return_value =  val2.iloc[0]
    return return_value

gb = df.groupby('ID')

>>> gb.apply(First)
ID
1      rehab
2    nursing
3       Home
4    nursing
dtype: object
>>>

Upvotes: 2

Related Questions