David
David

Reputation: 1267

Different behaviour with resample and asfreq in pandas

I have a dataframe like this:

                            A        B    value
2014-11-14 12:00:00      30.5    356.3      344
2014-11-15 00:00:00      30.5    356.3      347
2014-11-15 12:00:00      30.5    356.3      356
2014-11-16 00:00:00      30.5    356.3      349
...
2017-01-06 00:00:00      30.5    356.3      347

and I want to be sure that from the beginning until the end there're no missing times (i.e., the index goes from 12 to 12 hours with no bigger jumps). If there is a missing date, for instance, if there is a value missing, for instance in 2015-12-12 12:00:00 I would like to add a row like this:

...
2015-12-12 00:00:00     30.5    356.3    323
2015-12-12 12:00:00     30.5    356.3    NaN  *<- add this*
2015-12-13 00:00:00     30.5    356.3    347

The question of how to do it was solved here Resampling dataframe in pandas as a checking operation by @ted-petrou. The solution was doing:

df1= df.asfreq('12H')
df1[['A','B']] = df1[['A','B']].fillna(method='ffill')

My question: Can I do it with resample instead of asfreq? Doing

df1= df.resample('12H')
df1[['A','B']] = df1[['A','B']].fillna(method='ffill')

I get ValueError: cannot set items on DatetimeIndexResampler. I don't understand why. Are not the same operations resample and asfreq for this particular case? What am I missing? Thank you in advance.

Upvotes: 1

Views: 1107

Answers (1)

Nickil Maveli
Nickil Maveli

Reputation: 29711

Keep in mind that DF.resample() is a time-based groupby which must be followed by a reduction method on each of its groups.

So simply using this would only initialize the Resampler just like it happens when you call DF.rolling() method. Both behave similarly here:

df[['A', 'B']].resample('12H')
DatetimeIndexResampler [freq=<12 * Hours>, axis=0, closed=left, label=left, convention=start, base=0]

You need to specify an aggregation function along with it for it to have a metric for the computation of groups.

Inorder to do this for your case:

1) Use .resample().ffill() on the two columns and then join these with the third one. Naturally, since the 3rd wasn't resampled, they would be filled by NaNs.

df[['A', 'B']].resample('12H').ffill().join(df['value'])

2) Use .resample() and .asfreq() as its aggfunc similar to what you have done:

df1 = df.resample('12H').asfreq()
df1[['A','B']] = df1[['A','B']].fillna(method='ffill')

Note: Employing .asfreq() maybe more suitable for frequency conversions than a .resample here, if the end goal isn't about aggregating groups.

Upvotes: 1

Related Questions