Reputation: 309
Task:
Calculate the frequency of each ID for each month of 2021
Frequency formula: Activity period (Length of time between last activity and first activity) / (Number of activity Days - 1)
e.g. ID 1 - Month 2: Activity Period (2021-02-23 - 2021-02-18 = 5 days) / (3 active days - 1) == Frequency = 2,5
Sample:
times = [
'2021-02-18',
'2021-02-22',
'2021-02-23',
'2021-04-23',
'2021-01-18',
'2021-01-19',
'2021-01-20',
'2021-01-03',
'2021-02-04',
'2021-02-04'
]
id = [1, 1, 1, 1, 44, 44, 44, 46, 46, 46]
df = pd.DataFrame({'ID':id, 'Date': pd.to_datetime(times)})
df = df.reset_index(drop=True)
print(df)
ID Date
0 1 2021-02-18
1 1 2021-02-22
2 1 2021-02-23
3 1 2021-04-23
4 44 2021-01-18
5 44 2021-01-19
6 44 2021-01-20
7 46 2021-01-03
8 46 2021-02-04
9 46 2021-02-04
Desired Output:
If frequency negative == 0
id 01_2021 02_2021 03_2021 04_2021
0 1 0 2 0 0
1 44 1 0 0 0
2 46 0 0 0 0
Upvotes: 0
Views: 197
Reputation: 35686
Try a pivot_table with a custom aggfunc:
# Create Columns For Later
dr = pd.date_range(start=df['Date'].min(),
end=df['Date'].max() + pd.offsets.MonthBegin(1), freq='M') \
.map(lambda dt: dt.strftime('%m_%Y'))
new_df = (
df.pivot_table(
index='ID',
# Columns are dates in MM_YYYY format
columns=df['Date'].dt.strftime('%m_%Y'),
# Custom Agg Function
aggfunc=lambda x: (x.max() - x.min()) /
pd.offsets.Day(max(1, len(x) - 1))
# max(1, len(x) -1) to prevent divide by 0
)
# Fix Axis Names and Column Levels
.droplevel(0, axis=1)
.rename_axis(None, axis=1)
# Reindex to include every month from min to max date
.reindex(dr, axis=1)
# Clip to exclude negatives
.clip(lower=0)
# Fillna with 0
.fillna(0)
# Reset index
.reset_index()
)
print(new_df)
new_df
:
ID 01_2021 02_2021 03_2021 04_2021
0 1 0.0 2.5 0.0 0.0
1 44 1.0 0.0 0.0 0.0
2 46 0.0 0.0 0.0 0.0
Upvotes: 1
Reputation: 442
You will need to pivot the table, but first if you want only the month and year of the date, you need to transform it.
df['Date'] = df.Date.map(lambda s: "{}_{}".format(s.year,s.month))
df['counts'] = 1
df_new = pd.pivot_table(df, index=['ID'],
columns=['Date'], aggfunc=np.sum)
Upvotes: 0