Reputation: 21563
The time of my dataframe consist of 2 coloumns: date
and HrMn
, like this:
How can I read them into time, and plot a time series plot? (There are other value columns, for example, speed
).
I think I can get away with time.strptime('19900125'+'1200','%Y%m%d%H%M')
But the problem is that, when read from the csv
, HrMn
at 0000
would be parsed as 0
, so
time.strptime('19900125'+'0','%Y%m%d%H%M')
will fail.
UPDATE:
My current approach:
# When reading the data, pase HrMn as string
df = pd.read_csv(uipath,header=0, skipinitialspace=True, dtype={'HrMn': str})
df['time']=df.apply(lambda x:datetime.strptime("{0} {1}".format(x['date'],x['HrMn']), "%Y%m%d %H%M"),axis=1)# df.temp_date
df.index= df['time']
# Then parse it again as int
df['HrMn'] = df['HrMn'].astype(int)
Upvotes: 0
Views: 123
Reputation: 12620
I don' get why you call it "ill formatted", that format is actually quite common and pandas can parse it as is, just specify which columns you want to parse as timestamps.
df = pd.read_csv(uipath, skipinitialspace=True,
parse_dates=[['date', 'HrMn']])
Upvotes: 0
Reputation: 2448
You may parse the dates directly while reading the CSV, where HrMn
is zero padded as HHMM, i.e. a value of 0 will represent 00:00
:
df = pd.read_csv(
uipath,
header=0,
skipinitialspace=True,
dtype={'HrMn': str},
parse_dates={'datetime': ['date', 'HrMn']},
date_parser=lambda x, y: pd.datetime.strptime('{0}{1:04.0f}'.format(x, int(y)),
'%Y%m%d%H%M'),
index_col='datetime'
)
Upvotes: 2
Reputation: 76336
You can use pd.to_datetime
after you've transformed it into a string that looks like a date:
def to_date_str(r):
d = r.date[: 4] + '-' + r.date[4: 6] + '-' + r.date[6: 8]
d += ' '+ r.HrMn[: 2] + ':' + r.HrMn[2: 4]
return d
>>> pd.to_datetime(df[['date', 'HrMn']].apply(to_date_str, axis=1))
0 1990-01-25 12:00:00
dtype: datetime64[ns]
Edit
As @EdChum comments, you can do this even more simply as
pd.to_datetime(df.date.astype(str) + df.HrMn)
which string-concatenates the columns.
Upvotes: 2