Reputation: 3580
I have two columns in a dataframe that has date and name. On a daily basis, I would like to count the number of records that enter and are deleted on a daily basis.
import numpy as np
import pandas as pd
df1 = pd.DataFrame({'Name': ['A', 'B', 'C', 'D','E']})
df1['Date'] = '2016-01-01'
df2 = pd.DataFrame({'Name': ['C', 'D','E','F']})
df2['Date'] = '2016-01-02'
df3 = pd.DataFrame({'Name': ['B', 'C', 'D','E','F']})
df3['Date'] = '2016-01-03'
df4 = pd.DataFrame({'Name': ['A', 'D', 'E','H']})
df4['Date'] = '2016-01-04'
df=pd.concat([df1,df2,df3,df4])
df=df.reset_index(drop=True)
df
I would like an output that, for each date, counts the number of additions and subtractions by date. For example, on 2016-01-02, A and B are gone, but F is new, and 3 stayed the same. I would like the output to look like the following:
Date add del same
2016-01-02 1 2 3
I have tried to do a full outer join , then count the banks respectively, but that is sooooo inefficient!
Does anyone have any ideas of a more efficient way of doing this? Thank you so much!
Upvotes: 0
Views: 51
Reputation: 4744
I have an attempt but I cannot say whether it will be as fast or as stable as whatever your full outer join might be but it works with the example above.
Carrying from where you left off then,
df['Value'] = 1
df = df.set_index(['Date', 'Name']).unstack('Name').fillna(0)
df = (df - df.shift(1))
df = pd.DataFrame({i: j.value_counts() for i, j in df.iterrows()}).T.fillna(0)
df.columns = ['del', 'same', 'add']
print(df)
del same add
2016-01-01 0.0 0.0 0.0
2016-01-02 2.0 4.0 1.0
2016-01-03 0.0 6.0 1.0
2016-01-04 3.0 2.0 2.0
As you can see it is not beautiful, pythonic code. It depends upon the shift which means that the Date must be in order. You could use pd.to_datetime()
as an actual datetime to ensure that this happens.
It also used a dict comprehension that is then turned back into a DataFrame. Finally, it relies on the del, same and add columns being in that order. You could do an actual mapping as opposed to an overwrite.
I will be interested to know how this compares with the join in terms of speed. Do let us know and we can all learn something!
Upvotes: 1