Reputation: 18022
I have a Dataframe
with three columns store, hour, count
. The problem I'm facing is some hours
are missing for some stores
and I want them to be 0
.
This is how the dataframe
looks like
# store_id hour count
# 0 13 0 56
# 1 13 1 78
# 2 13 2 53
# 3 23 13 14
# 4 23 14 13
As you can see for the store
with id 13 doesn't have values for hours 3-23, similarly with store 23 it doesn't have values for many other hours.
I tried to solve this by creating a temporal dataframe with two columns id
and count
and performing a right outer join
, but didn't work.
Upvotes: 3
Views: 55
Reputation: 8378
Try this:
all_hours = set(range(24))
for sid in set(df['store_id']):
misshours = list(all_hours - set(df['hour'][df['store_id'] == sid]))
nmiss = len(misshours)
df = pandas.concat([df, DataFrame({'store_id': nmiss * [sid], misshours, 'count': nmiss * [0]})])
Upvotes: 1
Reputation: 862511
If typo and no duplicates in hour
per groups, solution is reindex
with MultiIndex.from_product
:
df = df.set_index(['store_id','hour'])
mux = pd.MultiIndex.from_product([df.index.levels[0], range(23)], names=df.index.names)
df = df.reindex(mux, fill_value=0).reset_index()
print (df)
store_id hour count
0 13 0 56
1 13 1 78
2 13 2 53
3 13 3 0
4 13 4 0
5 13 5 0
6 13 6 0
7 13 7 0
8 13 8 0
9 13 9 0
10 13 10 0
11 13 11 0
12 13 12 0
13 13 13 0
14 13 14 0
15 13 15 0
16 13 16 0
17 13 17 0
18 13 18 0
19 13 19 0
20 13 20 0
21 13 21 0
22 13 22 0
23 23 0 0
24 23 1 0
25 23 2 0
26 23 3 0
27 23 4 0
28 23 5 0
29 23 6 0
30 23 7 0
31 23 8 0
32 23 9 0
33 23 10 0
34 23 11 0
35 23 12 0
36 23 13 14
37 23 14 0
38 23 15 0
39 23 16 0
40 23 17 0
41 23 18 0
42 23 19 0
43 23 20 0
44 23 21 0
45 23 22 0
Upvotes: 2