Reputation: 465
I am importing some data that has one column for date and another for the hour. I would like to combine them into a single DateTime column.
I have used
df2 = pd.read_csv(csv_file_path,
parse_dates=[['Date', 'Hour']],
infer_datetime_format=True)
which has returned
Date_Hour Unnamed: 0 Data
0 2018-01-01 0 3 51.29
1 2018-01-01 1 4 43.59
2 2018-01-01 2 5 93.60
3 2018-01-01 3 6 54.78
4 2018-01-01 4 7 14.35
but the column Date_Hour is not in the DateTime format, and when I try to convert it I get an error
df2.set_index('Date_Hour', inplace=True)
pd.to_datetime(df2.index)
gives
ValueError: ('Unknown string format:', '2018-01-01 0')
I have also tried using
pd.to_datetime(prices.index, format='%y-%m-%d %H')
The data that I have imported looks like this;
,Date,Hour,Data
3,2018-01-01,0,51.29
4,2018-01-01,1,43.59
5,2018-01-01,2,93.60
6,2018-01-01,3,54.78
7,2018-01-01,4,14.35
8,2018-01-01,5,18.60
And I would like the output to be
Data
Date_Hour
2018-01-01 00:00 51.29
2018-01-01 01:00 43.59
2018-01-01 02:00 93.60
2018-01-01 03:00 54.78
2018-01-01 04:00 14.35
Upvotes: 1
Views: 89
Reputation: 1475
Charles Landau was close. Just some changes
df
sr Date Hour Data
0 3 2018-01-01 0 51.29
1 4 2018-01-01 1 43.59
2 5 2018-01-01 2 93.60
3 6 2018-01-01 3 54.78
4 7 2018-01-01 4 14.35
5 8 2018-01-01 5 18.60
df["Date"] = pd.to_datetime(df["Date"], unit="ns")
df["Hour"] = pd.to_timedelta(df["Hour"])
df["Date_Hour"]=(df["Date"] + df["Hour"]).dt.strftime('%Y-%m-%d %H:%M')
df.set_index("Date_Hour")
sr Date Hour Data
Date_Hour
2018-01-01 00:00 3 2018-01-01 0 51.29
2018-01-01 01:00 4 2018-01-01 1 43.59
2018-01-01 02:00 5 2018-01-01 2 93.60
2018-01-01 03:00 6 2018-01-01 3 54.78
2018-01-01 04:00 7 2018-01-01 4 14.35
2018-01-01 05:00 8 2018-01-01 5 18.60
Upvotes: 1
Reputation: 4265
Edit: a few answers greatly improved on my first crack at this. I'm going to leave this up since it is referenced in another answer.
Let's start with setting the dates to datetime type.
df["Date"] = pd.to_datetime(df["Date"], unit="ms")
We want to add hours to that so they need to be Timedelta
.
df["Hour"] = df["Hour"].apply(lambda x: pd.Timedelta(hours=int(x)))
Now a simple sum:
df["Date_hour"] = df["Date"] + df["Hour"]
Set a datetime index the same way you would any other column of datetime dtype:
df = df.set_index("Date_hour")
Upvotes: 1
Reputation: 5774
In this specific case the missing leading 0
of the hour is problematic. You can avoid this importing without parsing to datetime
and then using zfill
before parsing to datetime
:
df2 = pd.read_csv(csv_file_path)
df2.index = pd.to_datetime(df2.Date + df2.Hour.astype(str).str.zfill(2), format='%Y-%m-%d%H')
To drop the Date
and Hour
columns after parsing them:
df2.drop(['Date' ,'Hour'], axis=1, inplace=True)
Short explanation:
zfill
will pad the numeric string with zeros up the the specified number of characters.
Of course you can also specify a date_parser
for pd.read_csv
, but in this case it is imho much clearer and more pythonic to parse the dates after reading the csv.
Upvotes: 1