kosijnova
kosijnova

Reputation: 3

How to fill missing time values in Pandas DataFrame?

enter image description here

The third column shows the time data for each user from 04/01/2019 to 04/30/2019. For some users, some days are missing. What can I do to keep the monthly range for each user, with the remaining columns being filled with zeros?

Upvotes: 0

Views: 123

Answers (2)

Henry Ecker
Henry Ecker

Reputation: 35626

Try MultiIndex.from_product with the unique user_ids and pd.date_range from start to end date:

cols = ['user_id', 'date']
midx = pd.MultiIndex.from_product(
    (df['user_id'].unique(),
     pd.date_range(start='2019-04-01', end='2019-04-30', freq='D')),
    names=cols
)

Then set_index + reindex:

df = (
    df.set_index(cols)
        .reindex(midx, fill_value=0)
        .reset_index()
)

Fix day column (optional):

df['day'] = df['date'].dt.day

Sample Frame:

df = pd.DataFrame({
    'user_id': [99, 99, 100, 100],
    'date': ['2019-04-01', '2019-04-15', '2019-04-03', '2019-04-28'],
    'day': [1, 15, 3, 28],
    'count': [38, 39, 60, 57]
})
df['date'] = pd.to_datetime(df['date'])
   user_id       date  day  count
0       99 2019-04-01    1     38
1       99 2019-04-15   15     39
2      100 2019-04-03    3     60
3      100 2019-04-28   28     57

Output:

    user_id       date  day  count
0        99 2019-04-01    1     38
1        99 2019-04-02    2      0
                  ...
13       99 2019-04-14   14      0
14       99 2019-04-15   15     39
15       99 2019-04-16   16      0
                  ...
28       99 2019-04-29   29      0
29       99 2019-04-30   30      0
30      100 2019-04-01    1      0
31      100 2019-04-02    2      0
32      100 2019-04-03    3     60
33      100 2019-04-04    4      0
34      100 2019-04-05    5      0
                  ...
54      100 2019-04-25   25      0
55      100 2019-04-26   26      0
56      100 2019-04-27   27      0
57      100 2019-04-28   28     57
58      100 2019-04-29   29      0
59      100 2019-04-30   30      0

Complete Working Example:

import pandas as pd

df = pd.DataFrame({
    'user_id': [99, 99, 100, 100],
    'date': ['2019-04-01', '2019-04-15', '2019-04-03', '2019-04-28'],
    'day': [1, 15, 3, 28],
    'count': [38, 39, 60, 57]
})
df['date'] = pd.to_datetime(df['date'])

# Columns to Reindex
cols = ['user_id', 'date']
# Create Multi Index
midx = pd.MultiIndex.from_product(
    (df['user_id'].unique(),
     pd.date_range(start='2019-04-01', end='2019-04-30', freq='D')),
    names=cols
)
# Reindex
df = (
    df.set_index(cols)
        .reindex(midx, fill_value=0)
        .reset_index()
)
# Fix Day Column
df['day'] = df['date'].dt.day

print(df)

Upvotes: 1

Alexander Volkovsky
Alexander Volkovsky

Reputation: 2918

Use groupby() + reindex()

def fill_missing_dates(df):
    return (
        df.set_index("date")
        .reindex(pd.date_range("2019-04-01", "2019-04-30"), fill_value=0)
        .assign(day=lambda x: x.index.day)  # set correct day column
        .reset_index()
    )

df.groupby("user_id", as_index=False).apply(fill_missing_dates).reset_index(drop=True)

Upvotes: 0

Related Questions