Reputation: 852
I would like to resample every 4th row in a Pandas dataframe. As suggested How to select every 4th row in a pandas dataframe and calculate the rolling average here I use the following code
import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow import keras
from matplotlib import pyplot as plt
#Read the input data
df_generation = pd.read_csv("C:/Users/Data/Electricity Price Forecasting/Generation.csv", sep =";")
print(df_generation.dtypes)
df_generation_short = df_generation[0:2000]
df_generation_short['Time'] = pd.to_datetime(df_generation_short['Time'])
new = df_generation_short['Biomass'].resample('1H').mean()
I convert the column time in the original dataframe into a datetime because otherwise pandas sees it as an object type (as recommended here enter link description here However, I still get the error message
TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'
I also get a warning before the error telling me:
SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df_generation_short['Time'] = pd.to_datetime(df_generation_short['Time'])
Traceback (most recent call last):
Here you can see a screenshot of the dataframe
Do you know why I get this error and how I can solve the problem? I'd appreciate every comment.
Update: I tried it with the suggestion from one comment and used the apply function:
df_generation_short.apply(pd.to_datetime(df_generation_short['Time']))
but I get the error message "ValueError: no results". Does anyone have another idea how to solve the problem? Somehow pandas does not accept the column "Time" as a date object with an index altough I convert it by using df_generation_short['Time'] = pd.to_datetime(df_generation_short['Time'])
.
Upvotes: 0
Views: 391
Reputation: 1524
To sum up our conversation:
new = df_generation_short['Biomass'].resample('1H').mean()
throws the TypeError:TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'
Biomass
does not contain dates inputs. Thus, in order to solve this problem, set your DataFrame index to column Time
:df_generation_short = df_generation_short.set_index('Time')
Biomass
in a window of 1 hour,new = df_generation_short['Biomass'].resample('1H').mean()
new = df_generation_short.resample('1H').mean()
Or if you want it for two specific columns: "Biomass" and "Fossil Oil" for instance:
new = df_generation_short[["Biomass", "Fossil Oil"]].resample('1H').mean()
Upvotes: 1