Eli Turasky
Eli Turasky

Reputation: 1061

Pandas resample won't change time increments

I have a dictionary with 4 different keys that represent a different crop (corn, soybeans, winter wheat, and spring wheat). Each key has 10 different dataarrays as the values within the dictionary (temperature, 24 hour temperature change, etc). I then want to create two new dictionaries from the data, seperating out the actual values (like temperature, precipitation) from the 24 hour change values. The data is 6-hourly.

corn=glob.glob('/Users/eli/Documents/Python_data/plotly_practice/20200812_00/20200812_00_ec_ens_*'+'corn'+'_timeseries.nc')
soybean=glob.glob('/Users/eli/Documents/Python_data/plotly_practice/20200812_00/20200812_00_ec_ens_*'+'soybeans'+'_timeseries.nc')
winterwheat=glob.glob('/Users/eli/Documents/Python_data/plotly_practice/20200812_00/20200812_00_ec_ens_*'+'winterwheat'+'_timeseries.nc')
springwheat=glob.glob('/Users/eli/Documents/Python_data/plotly_practice/20200812_00/20200812_00_ec_ens_*'+'springwheat'+'_timeseries.nc')
all_files=[corn, soybean,winterwheat,springwheat]

crop_names=['corn', 'soybeans', 'winterwheat', 'springwheat']
data={}
for i in crop_names:
    for j in all_files:
        data[i]=xr.open_mfdataset(j)

I create two empty dictionaries and then iterate over the keys.

today=dt.date.today()
df_vals={}
df_deltas={}
for i in data.keys():

Next, I fill the df_vals.

    df_vals[str(i)]=data[i].to_dataframe().reset_index()
    df_vals[i]['time']=pd.date_range((today-dt.timedelta(days=1)), (today+dt.timedelta(days=14)), freq='6H')

Then, I want to fill the df_deltas. However, I want to do this one a little differently. For the deltas I am concerned about the 24 hour change so I need to apply a rolling mean or sum, depending on if the variable is temperature or precipitation.

    df_deltas[i]=df_vals[i].filter(regex='delta')
    df_deltas[i]['time']=pd.date_range((today-dt.timedelta(days=1)), (today+dt.timedelta(days=14)), freq='6H')
    df_deltas[i]=df_deltas[i].set_index('time')
    df_deltas[i].loc[:, df_deltas[i].columns.str.contains('precip')]=df_deltas[i].resample('24H').sum()
    df_deltas[i].loc[:, df_deltas[i].columns.str.contains('temp')]=df_deltas[i].resample('24H').mean()
    df_deltas[i]=df_deltas[i].reset_index()

While the calculations are done correctly, the updated dataframe does not downscale the time. Here is the output for one crop.

df_deltas['corn]

time    2m_temp_24hdelta_prod   2m_temp_24hdelta_area   total_precip_24hdelta_prod  total_precip_24hdelta_area
0   2020-08-13 00:00:00 0.228715    0.161631    -0.650041   -0.552645
1   2020-08-13 06:00:00 NaN NaN NaN NaN
2   2020-08-13 12:00:00 NaN NaN NaN NaN
3   2020-08-13 18:00:00 NaN NaN NaN NaN
4   2020-08-14 00:00:00 0.676321    0.214109    -1.312289   -1.020344

How do I force the time to collapse, and thus get rid of all the nans?

Upvotes: 0

Views: 50

Answers (1)

Joran Beasley
Joran Beasley

Reputation: 114068

resampled_df = df_deltas[['precip','temp']].resample('24h').agg({'precip':'sum','temp':'mean'})

you need to have all series in a df have the same index

you could instead do something like

interesting_cols = [c for c in df_deltas.columns if "precip" in c or "temp" in c]
aggs = {c:'sum' if 'precip' in c else 'mean' for c in interesting_cols}
df_deltas[columns].resample('24h').agg(aggs)

Upvotes: 2

Related Questions