Noah Watkins
Noah Watkins

Reputation: 5540

Adding missing data to dataframe grouped by date

I have a Pandas dataframe with datetime column named time. I'd like to count the number of rows per hour. The problem is that I'd like the resulting table handle hours for which no rows exist. Example:

    time    id  lat lon type
0   2017-06-09 19:34:59.945128-07:00    75  36.999866   -122.058180 UPPER CAMPUS
1   2017-06-09 19:53:56.387058-07:00    75  36.979664   -122.058900 OUT OF SERVICE/SORRY
2   2017-06-09 19:28:53.525189-07:00    75  36.988640   -122.066820 UPPER CAMPUS
3   2017-06-09 19:30:31.633478-07:00    75  36.991657   -122.066605 UPPER CAMPUS

I can get these values using df.groupby(df.time.dt.hour).count() which produces:

    time    id  lat lon type
time                    
0   2121    2121    2121    2121    2121
1   2334    2334    2334    2334    2334
2   1523    1523    1523    1523    1523
6   8148    8148    8148    8148    8148

Which is correct: 0, 1, 2 are the hours of the day. However, I'd like to represent that there are no rows for hours 3, 4, 5. Having each of these column names is unnecessary, since the value is the same for each.

Upvotes: 1

Views: 32

Answers (1)

jezrael
jezrael

Reputation: 862641

You can use reindex:

#if want all hours
df1 = df.groupby(df.time.dt.hour)[''].count().reindex(range(23), fill_value=0)

#if want 0 to max hour
df1 = df.groupby(df.time.dt.hour).count()
        .reindex(range(df.time.dt.hour.max() + 1), fill_value=0)

Upvotes: 1

Related Questions