TypeError when resampling a pandas dataframe

Question

I would like to resample every 4th row in a Pandas dataframe. As suggested How to select every 4th row in a pandas dataframe and calculate the rolling average here I use the following code

import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow import keras
from matplotlib import pyplot as plt



#Read the input data
df_generation = pd.read_csv("C:/Users/Data/Electricity Price Forecasting/Generation.csv", sep =";")
print(df_generation.dtypes)
df_generation_short = df_generation[0:2000]
df_generation_short['Time'] = pd.to_datetime(df_generation_short['Time'])

new = df_generation_short['Biomass'].resample('1H').mean()

I convert the column time in the original dataframe into a datetime because otherwise pandas sees it as an object type (as recommended here enter link description here However, I still get the error message

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'

I also get a warning before the error telling me:

SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_generation_short['Time'] = pd.to_datetime(df_generation_short['Time'])
Traceback (most recent call last):

Here you can see a screenshot of the dataframe

Do you know why I get this error and how I can solve the problem? I'd appreciate every comment.

Update: I tried it with the suggestion from one comment and used the apply function: df_generation_short.apply(pd.to_datetime(df_generation_short['Time'])) but I get the error message "ValueError: no results". Does anyone have another idea how to solve the problem? Somehow pandas does not accept the column "Time" as a date object with an index altough I convert it by using df_generation_short['Time'] = pd.to_datetime(df_generation_short['Time']).

dallonsi · Accepted Answer

To sum up our conversation:

This line new = df_generation_short['Biomass'].resample('1H').mean() throws the TypeError:

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'

This is because the column Biomass does not contain dates inputs. Thus, in order to solve this problem, set your DataFrame index to column Time:

df_generation_short = df_generation_short.set_index('Time')

Now, if you want to get the mean values of Biomass in a window of 1 hour,

new = df_generation_short['Biomass'].resample('1H').mean()

Moreover, if you want to compute this mean over all columns, just omit to specify the column

new = df_generation_short.resample('1H').mean()

Or if you want it for two specific columns: "Biomass" and "Fossil Oil" for instance:

new = df_generation_short[["Biomass", "Fossil Oil"]].resample('1H').mean()

TypeError when resampling a pandas dataframe

Answers (1)

Related Questions