Reputation: 1045
The df looks like this:
df.columns = ['ReportDate', 'ClientId', 'ClientRevenue']
I want to get list of all clients reporting higher revenue b/w 2 dates. Here's some non-tested, outline code but wondering if there's a more direct, Pythonic approach:
enddatedf = df.loc[df['ReportDate'] == endDate]
startdatedf = df.loc[df['ReportDate'] == startDate]
endclients = enddatedf['ClientId'].unique()
startclients = startdatedf['ClientId'].unique()
commonclients = list(set(startclients).intersect(set(endclients)) #because clients might have dropped off in b/w
risingclients = []
for client in commonclients:
startrevenue = startdatedf.loc[startdatedf['ClientId'] == client, 'ClientRevenue'].values[0]
endrevenue = enddatedf.loc[enddatedf['ClientId'] == client, 'ClientRevenue'].values[0]
if endrevenue > startrevenue:
risingclients.append(client)
Thanks
Upvotes: 2
Views: 346
Reputation: 3184
Creating data. Please provide data in your questions. :)
startdate = pd.datetime(2019, 1, 1)
enddate = pd.datetime(2019, 3, 31)
df = pd.DataFrame(
data={
"ReportDate": [startdate, enddate, startdate, enddate, startdate, enddate],
"ClientId": [2, 1, 3, 3, 1, 2],
"ClientRevenue": [1432, 8493, 2316, 2145, 3211, 8763],
}
)
print(df)
ReportDate ClientId ClientRevenue
0 2019-01-01 2 1432
1 2019-03-31 1 8493
2 2019-01-01 3 2316
3 2019-03-31 3 2145
4 2019-01-01 1 3211
5 2019-03-31 2 8763
First step is to filter the df for the startdate and enddate.
df = df.loc[((df['ReportDate']==startdate) | (df['ReportDate']==enddate)),:]
Next sort the dataframe so that you will have clients together, in date order.
df = df.sort_values(['ClientId','ReportDate'])
ReportDate ClientId ClientRevenue
4 2019-01-01 1 3211
1 2019-03-31 1 8493
0 2019-01-01 2 1432
5 2019-03-31 2 8763
2 2019-01-01 3 2316
3 2019-03-31 3 2145
Next, subtract the startdate ClientRevenue, from the enddate ClientRevenue. If the value is positive, then the client had growth between the two dates.
result = df.groupby('ClientId').last() - df.groupby('ClientId').first()
print(result)
ReportDate ClientRevenue
ClientId
1 89 days 5282
2 89 days 7331
3 89 days -171
Finally, filter the result dataframe for positive 'ClientRevenue' and put the index ('ClientId') to list.
print("ClientId with positive return: ", result[result['ClientRevenue']>0].index.tolist())
ClientId with positive return: [1, 2]
EDIT I missed the part about clients dropping off, but I went back and tested and it still works.
Adding in ClientId = 0 but only with a startdate.
ReportDate ClientId ClientRevenue
0 2019-01-01 0 1324
1 2019-01-01 2 1432
2 2019-03-31 1 8493
3 2019-01-01 3 2316
4 2019-03-31 3 2145
5 2019-01-01 1 3211
6 2019-03-31 2 8763
result calculation is:
ReportDate ClientRevenue
ClientId
0 0 days 0
1 89 days 5282
2 89 days 7331
3 89 days -171
ClientId with positive return: [1, 2]
Upvotes: 1
Reputation: 2500
df = df.sort_values(['ReportDate'], ascending=[True]) #Ensure your ReportDate is datetime column
df = df[(df['ReportDate'] > startDate) & (df['date'] <= endDate)] #You can have startDate, endDate as variables at top of your code section
del df['ReportDate']
df = df.groupby(['ClientId'],as_index=False).sum()
df = df.sort_values(['ClientRevenue'], ascending=[False])
top5 = df.head(5) #Selecting the top 5 clients
Upvotes: 0