Reputation: 405
I've got an involved situation. Let's say I have the following example dataframe of loans:
test_df = pd.DataFrame({'name': ['Jack','Jill','John','Jack','Jill'],
'date': ['2016-08-08','2016-08-08','2016-08-07','2016-08-08','2016-08-08'],
'amount': [1000.0,1500.0,2000.0,2000.0,3000.0],
'return_amount': [5000.0,2000.0,3000.0,0.0,0.0],
'return_date': ['2017-08-08','2017-08-08','2017-08-07','','2017-08-08']})
test_df.head()
amount date name return_amount return_date
0 1000.0 2016-08-08 Jack 5000.0 2017-08-08
1 1500.0 2016-08-08 Jill 2000.0 2017-08-08
2 2000.0 2016-08-07 John 3000.0 2017-08-07
3 2500.0 2016-08-08 Jack 0.0
4 2500.0 2016-08-08 Jill 0.0 2017-08-08
There are a few operations I need to perform after grouping this dataframe by name (grouping loans by person):
1) return amount
needs to allocated proportionally by the sum of amount
.
2) If return date
is missing for ANY loan for a given person, then all return_dates should be converted to empty strings ''.
I already have a function that I use to allocate the proportional return amount:
def allocate_return_amount(group):
loan_amount = group['amount']
return_amount = group['return_amount']
sum_amount = loan_amount.sum()
sum_return_amount = return_amount.sum()
group['allocated_return_amount'] = (loan_amount/sum_amount) * sum_return_amount
return group
And I use grouped_test_df = grouped_test_df.apply(allocate_return_amount)
to apply it.
What I am struggling with is the second operation I need to perform, checking if any of the loans to a person are missing a return_date
, and if so, changing all return_dates
for that person to ''.
I've found GroupBy.all in the pandas documentation, but I haven't figured out how to use it yet, anyone with experience with this?
Since this example might be a bit hard to follow, here's my ideal output for this example:
ideal_test_df.head()
amount date name return_amount return_date
0 1000.0 2016-08-08 Jack 0.0 ''
1 1500.0 2016-08-08 Jill 666.66 2017-08-08
2 2000.0 2016-08-07 John 3000.0 2017-08-07
3 2500.0 2016-08-08 Jack 0.0 ''
4 2500.0 2016-08-08 Jill 1333.33 2017-08-08
Hopefully this makes sense, and thank you in advance to any pandas expert who takes the time to help me out!
Upvotes: 4
Views: 130
Reputation: 2238
You can do it by iterating through the groups, testing the condition using any
, then setting back to the original dataframe using loc
:
test_df = pd.DataFrame({'name': ['Jack','Jill','John','Jack','Jill'],
'date': ['2016-08-08','2016-08-08','2016-08-07','2016-08-08','2016-08-08'],
'amount': [1000.0,1500.0,2000.0,2000.0,3000.0],
'return_amount': [5000.0,2000.0,3000.0,0.0,0.0],
'return_date': ['2017-08-08','2017-08-08','2017-08-07','','2017-08-08']})
grouped = test_df.groupby('name')
for name, group in grouped:
if any(group['return_date'] == ''):
test_df.loc[group.index,'return_date'] = ''
And if you want to reset return_amount
also, and don't mind the additional overhead, just add this line right after:
test_df.loc[group.index, 'return_amount'] = 0
Upvotes: 2