Reputation: 143
I have a dataframe that looks like this:
dict_1 = {"Code" : ['A', 'A', 'A', 'A', 'A', 'A'],
'Period' : ['2022-04-29','2022-04-29', '2022-04-30', '2022-05-01', '2022-05-01', '2022-05-01']}
df_1 = pd.DataFrame(dict_1)
df_1['Period'] = pd.to_datetime(df_1['Period']).dt.strftime("%Y-%m-%d")
df_1.head(10)
Code | Period |
---|---|
A | 2022-04-29 |
A | 2022-04-29 |
A | 2022-04-30 |
A | 2022-05-01 |
A | 2022-05-01 |
A | 2022-05-01 |
I have to create a new column, i.e., if the month ends then Count
should start from 1.
Below is the code that I have tried at my end.
df_2 = df_1.groupby(['Period', 'Code'], as_index=False).size()
df_2.head()
Code | Period | size |
---|---|---|
A | 2022-04-29 | 2 |
A | 2022-04-30 | 1 |
A | 2022-05-01 | 3 |
def Cumulative(lists):
cu_list = []
length = len(lists)
cu_list = [sum(lists[0:x:1]) for x in range(0, length+1)]
return cu_list[1:]
df_2['Count'] = Cumulative(df_2['size'])
df_2.head()
Code | Period | size | Count |
---|---|---|---|
A | 2022-04-29 | 2 | 2 |
A | 2022-04-30 | 1 | 3 |
A | 2022-05-01 | 3 | 6 |
For the row with a Period
of 2022-05-01, the total count should be 3 instead of 6 because a new month has started.
Upvotes: 2
Views: 412
Reputation: 28322
Use groupby
on the month (and year to be safe) information from Period
and apply cumsum
:
year_col = pd.to_datetime(df_2['Period']).dt.year
month_col = pd.to_datetime(df_2['Period']).dt.month
df_2['count'] = df_2.groupby([year_col, month_col])['size'].cumsum()
Result:
Period Code size count
0 2022-04-29 A 2 2
1 2022-04-30 A 1 3
2 2022-05-01 A 3 3
Upvotes: 4