Reputation: 11
I have a csv file with a long timestamp column (years):
1990-05-12 14:01
.
.
1999-01-10 10:00
where the time is in hh:mm format. I'm trying to extract each day worth of data into a new csv file. Here's my code:
import datetime
import pandas as pd
df = pd.read_csv("/home/parallels/Desktop/ewh_log/hpwh_log.csv",parse_dates=True)
#change timestmap column format
def extract_months_data(df):
df = pd.to_datetime(df['timestamp'])
print(df)
def write_o_csv(df):
print('writing ..')
#todo
x1 = pd.to_datetime(df['timestamp'],format='%m-%d %H:%M').notnull().all()
if (x1)==True:
extract_months_data(df)
else:
x2 = pd.to_datetime(df['timestamp'])
x2 = x1.dt.strftime('%m-%d %H:%M')
write_to_csv(df)
The issue is that when I get to the following line
def extract_months_data(df):
df = pd.to_datetime(df['timestamp'])
I get the following error:
pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime
Is there alternative solution to do it with pandas without ignoring the rest of the data. I saw posts that suggested using coerce but that replaces the rest of the data with NaT.
Thanks
UPDATE:
This post here here answers half of the question which is how to filter hours (or minutes) out of timestamp column. The second part would be how to extract a full day to another csv file. I'll post updates here once I get to a solution.
Upvotes: 0
Views: 168
Reputation: 908
You are converting to datetime two times which is not needed
Something like that should work
import pandas as pd
df = pd.read_csv('data.csv')
df['month_data'] = pd.to_datetime(df['timestamp'], format='%Y-%m-%d %H:%M')
df['month_data'] = df['month_data'].dt.strftime('%m-%d %H:%M')
# If you dont want columns with month_data NaN
df = df[df['month_data'].notna()]
print(df)
Upvotes: 1