Reputation: 1454
I am trying to add missing dates to my dataframe.
I have seen this posts: reindex and reindex2.
When I try to reindex my dataframe:
print(df)
df = df.reindex(dates, fill_value=0)
print(df)
I get the following output:
_updated_at Name hour day date time data1 data2
06/06/2016 13:27 game_name 13 6 06/06/2016 evening 0 0
07/06/2016 10:33 game_name 10 7 07/06/2016 morning 145.2788 122.7361
18/10/2016 14:34 game_name 14 18 18/10/2016 evening 0 0
19/10/2016 17:12 game_name 17 19 19/10/2016 evening 0 0
24/10/2016 11:05 game_name 11 24 24/10/2016 morning 313.5954 364.4107
24/10/2016 12:02 game_name 12 24 24/10/2016 evening 0 0
25/10/2016 08:50 game_name 8 25 25/10/2016 morning 362.4682 431.5803
25/10/2016 13:00 game_name 13 25 25/10/2016 evening 0 0
_updated_at Name hour day date time data1 data2
24/10/2016 0 0 0 0 0 0 0
25/10/2016 0 0 0 0 0 0 0
26/10/2016 0 0 0 0 0 0 0
27/10/2016 0 0 0 0 0 0 0
28/10/2016 0 0 0 0 0 0 0
29/10/2016 0 0 0 0 0 0 0
30/10/2016 0 0 0 0 0 0 0
I am expecting to see the rows where a date is missing filled with the new row and 0's in each value, rather than all rows replaced with 0.
EDIT: The overall goal is to be able to calculate the difference between values resulting in a morning and evening diff on a per day basis.
EDIT2: Current output:
print (df.reindex(mux, fill_value=0).groupby(level=0)['data1'].diff(-1).dropna())
dtypes: float64(2)None
2016-06-06 morning 0.00000
2016-06-07 morning 440.99582
2016-06-08 morning 0.00000
2016-06-09 morning 0.00000
2016-06-10 morning 0.00000
print (df.reindex(mux, fill_value=0).groupby(level=0)['data2'].diff(-1).dropna())
Length: 142, dtype: float64
2016-06-06 morning -220.5481
2016-06-07 morning 0.0000
2016-06-08 morning 0.0000
2016-06-09 morning 0.0000
2016-06-10 morning 0.0000
2016-06-11 morning 0.0000
I was expecting to see evening
values
Upvotes: 1
Views: 407
Reputation: 862761
You can reindex
by MultiIndex.from_product
from columns dates
and time
:
df.date = pd.to_datetime(df.date)
dates = pd.date_range(start=df.date.min(), end=df.date.max())
print (dates)
DatetimeIndex(['2016-06-06', '2016-06-07', '2016-06-08', '2016-06-09',
'2016-06-10', '2016-06-11', '2016-06-12', '2016-06-13',
'2016-06-14', '2016-06-15',
...
'2016-10-16', '2016-10-17', '2016-10-18', '2016-10-19',
'2016-10-20', '2016-10-21', '2016-10-22', '2016-10-23',
'2016-10-24', '2016-10-25'],
dtype='datetime64[ns]', length=142, freq='D')
mux = pd.MultiIndex.from_product([dates,['morning','evening']])
#print (mux)
df.set_index(['date','time'], inplace=True)
print (df.reindex(mux, fill_value=0))
_updated_at Name hour day data1 data2
2016-06-06 morning 0 0 0 0 0.0000 0.0000
evening 06/06/2016 13:27 game_name 13 6 0.0000 0.0000
2016-06-07 morning 0 0 0 0 0.0000 0.0000
evening 0 0 0 0 0.0000 0.0000
2016-06-08 morning 0 0 0 0 0.0000 0.0000
evening 0 0 0 0 0.0000 0.0000
2016-06-09 morning 0 0 0 0 0.0000 0.0000
evening 0 0 0 0 0.0000 0.0000
2016-06-10 morning 0 0 0 0 0.0000 0.0000
evening 0 0 0 0 0.0000 0.0000
2016-06-11 morning 0 0 0 0 0.0000 0.0000
evening 0 0 0 0 0.0000 0.0000
2016-06-12 morning 0 0 0 0 0.0000 0.0000
evening 0 0 0 0 0.0000 0.0000
2016-06-13 morning 0 0 0 0 0.0000 0.0000
...
Last you can groupby
by first level of Multiindex
(dates) with DataFrameGroupBy.diff
. You get for each dates row with NaN
which can be removed by dropna
:
print (df.reindex(mux, fill_value=0).groupby(level=0)['data1','data2'].diff(-1).dropna())
data1 data2
2016-06-06 morning 0.0000 0.0000
2016-06-07 morning 0.0000 0.0000
2016-06-08 morning 0.0000 0.0000
2016-06-09 morning 0.0000 0.0000
2016-06-10 morning 0.0000 0.0000
2016-06-11 morning 0.0000 0.0000
2016-06-12 morning 0.0000 0.0000
2016-06-13 morning 0.0000 0.0000
2016-06-14 morning 0.0000 0.0000
2016-06-15 morning 0.0000 0.0000
2016-06-16 morning 0.0000 0.0000
2016-06-17 morning 0.0000 0.0000
2016-06-18 morning 0.0000 0.0000
2016-06-19 morning 0.0000 0.0000
2016-06-20 morning 0.0000 0.0000
2016-06-21 morning 0.0000 0.0000
...
...
You can also select by ix
and subtract:
print (df.reindex(mux, fill_value=0)
.groupby(level=0)
.apply(lambda x: x.ix[0, ['data1','data2']]-x.ix[1, ['data1','data2']]))
data1 data2
2016-06-06 0.0000 0.0000
2016-06-07 0.0000 0.0000
2016-06-08 0.0000 0.0000
2016-06-09 0.0000 0.0000
2016-06-10 0.0000 0.0000
2016-06-11 0.0000 0.0000
2016-06-12 0.0000 0.0000
2016-06-13 0.0000 0.0000
2016-06-14 0.0000 0.0000
2016-06-15 0.0000 0.0000
2016-06-16 0.0000 0.0000
2016-06-17 0.0000 0.0000
2016-06-18 0.0000 0.0000
2016-06-19 0.0000 0.0000
2016-06-20 0.0000 0.0000
2016-06-21 0.0000 0.0000
2016-06-22 0.0000 0.0000
2016-06-23 0.0000 0.0000
2016-06-24 0.0000 0.0000
2016-06-25 0.0000 0.0000
2016-06-26 0.0000 0.0000
2016-06-27 0.0000 0.0000
2016-06-28 0.0000 0.0000
2016-06-29 0.0000 0.0000
2016-06-30 0.0000 0.0000
2016-07-01 0.0000 0.0000
2016-07-02 0.0000 0.0000
2016-07-03 0.0000 0.0000
2016-07-04 0.0000 0.0000
2016-07-05 0.0000 0.0000
... ...
2016-09-26 0.0000 0.0000
2016-09-27 0.0000 0.0000
2016-09-28 0.0000 0.0000
2016-09-29 0.0000 0.0000
2016-09-30 0.0000 0.0000
2016-10-01 0.0000 0.0000
2016-10-02 0.0000 0.0000
2016-10-03 0.0000 0.0000
2016-10-04 0.0000 0.0000
2016-10-05 0.0000 0.0000
2016-10-06 0.0000 0.0000
2016-10-07 0.0000 0.0000
2016-10-08 0.0000 0.0000
2016-10-09 0.0000 0.0000
2016-10-10 0.0000 0.0000
2016-10-11 0.0000 0.0000
2016-10-12 0.0000 0.0000
2016-10-13 0.0000 0.0000
2016-10-14 0.0000 0.0000
2016-10-15 0.0000 0.0000
2016-10-16 0.0000 0.0000
2016-10-17 0.0000 0.0000
2016-10-18 0.0000 0.0000
2016-10-19 0.0000 0.0000
2016-10-20 0.0000 0.0000
2016-10-21 0.0000 0.0000
2016-10-22 0.0000 0.0000
2016-10-23 0.0000 0.0000
2016-10-24 313.5954 364.4107
2016-10-25 362.4682 431.5803
[142 rows x 2 columns]
Upvotes: 1