Reputation: 3
The third column shows the time data for each user from 04/01/2019 to 04/30/2019. For some users, some days are missing. What can I do to keep the monthly range for each user, with the remaining columns being filled with zeros?
Upvotes: 0
Views: 123
Reputation: 35626
Try MultiIndex.from_product
with the unique
user_ids and pd.date_range
from start to end date:
cols = ['user_id', 'date']
midx = pd.MultiIndex.from_product(
(df['user_id'].unique(),
pd.date_range(start='2019-04-01', end='2019-04-30', freq='D')),
names=cols
)
df = (
df.set_index(cols)
.reindex(midx, fill_value=0)
.reset_index()
)
Fix day column (optional):
df['day'] = df['date'].dt.day
Sample Frame:
df = pd.DataFrame({
'user_id': [99, 99, 100, 100],
'date': ['2019-04-01', '2019-04-15', '2019-04-03', '2019-04-28'],
'day': [1, 15, 3, 28],
'count': [38, 39, 60, 57]
})
df['date'] = pd.to_datetime(df['date'])
user_id date day count
0 99 2019-04-01 1 38
1 99 2019-04-15 15 39
2 100 2019-04-03 3 60
3 100 2019-04-28 28 57
Output:
user_id date day count
0 99 2019-04-01 1 38
1 99 2019-04-02 2 0
...
13 99 2019-04-14 14 0
14 99 2019-04-15 15 39
15 99 2019-04-16 16 0
...
28 99 2019-04-29 29 0
29 99 2019-04-30 30 0
30 100 2019-04-01 1 0
31 100 2019-04-02 2 0
32 100 2019-04-03 3 60
33 100 2019-04-04 4 0
34 100 2019-04-05 5 0
...
54 100 2019-04-25 25 0
55 100 2019-04-26 26 0
56 100 2019-04-27 27 0
57 100 2019-04-28 28 57
58 100 2019-04-29 29 0
59 100 2019-04-30 30 0
Complete Working Example:
import pandas as pd
df = pd.DataFrame({
'user_id': [99, 99, 100, 100],
'date': ['2019-04-01', '2019-04-15', '2019-04-03', '2019-04-28'],
'day': [1, 15, 3, 28],
'count': [38, 39, 60, 57]
})
df['date'] = pd.to_datetime(df['date'])
# Columns to Reindex
cols = ['user_id', 'date']
# Create Multi Index
midx = pd.MultiIndex.from_product(
(df['user_id'].unique(),
pd.date_range(start='2019-04-01', end='2019-04-30', freq='D')),
names=cols
)
# Reindex
df = (
df.set_index(cols)
.reindex(midx, fill_value=0)
.reset_index()
)
# Fix Day Column
df['day'] = df['date'].dt.day
print(df)
Upvotes: 1
Reputation: 2918
def fill_missing_dates(df):
return (
df.set_index("date")
.reindex(pd.date_range("2019-04-01", "2019-04-30"), fill_value=0)
.assign(day=lambda x: x.index.day) # set correct day column
.reset_index()
)
df.groupby("user_id", as_index=False).apply(fill_missing_dates).reset_index(drop=True)
Upvotes: 0