ZK Zhao
ZK Zhao

Reputation: 21563

Pandas: How to read ill formated time data?

The time of my dataframe consist of 2 coloumns: date and HrMn, like this:

enter image description here

How can I read them into time, and plot a time series plot? (There are other value columns, for example, speed).

I think I can get away with time.strptime('19900125'+'1200','%Y%m%d%H%M')

But the problem is that, when read from the csv, HrMn at 0000 would be parsed as 0, so time.strptime('19900125'+'0','%Y%m%d%H%M') will fail.

UPDATE:

My current approach:

# When reading the data, pase HrMn as string
df = pd.read_csv(uipath,header=0, skipinitialspace=True, dtype={'HrMn': str})
df['time']=df.apply(lambda x:datetime.strptime("{0} {1}".format(x['date'],x['HrMn']), "%Y%m%d %H%M"),axis=1)# df.temp_date
df.index= df['time']
# Then parse it again as int
df['HrMn'] = df['HrMn'].astype(int)

Upvotes: 0

Views: 123

Answers (3)

Stop harming Monica
Stop harming Monica

Reputation: 12620

I don' get why you call it "ill formatted", that format is actually quite common and pandas can parse it as is, just specify which columns you want to parse as timestamps.

df = pd.read_csv(uipath, skipinitialspace=True,
                 parse_dates=[['date', 'HrMn']])

Upvotes: 0

fernandezcuesta
fernandezcuesta

Reputation: 2448

You may parse the dates directly while reading the CSV, where HrMn is zero padded as HHMM, i.e. a value of 0 will represent 00:00:

df = pd.read_csv(
    uipath,
    header=0,
    skipinitialspace=True,
    dtype={'HrMn': str},
    parse_dates={'datetime': ['date', 'HrMn']},
    date_parser=lambda x, y: pd.datetime.strptime('{0}{1:04.0f}'.format(x, int(y)),
                                                  '%Y%m%d%H%M'),
    index_col='datetime'
)

Upvotes: 2

Ami Tavory
Ami Tavory

Reputation: 76336

You can use pd.to_datetime after you've transformed it into a string that looks like a date:

def to_date_str(r):
    d = r.date[: 4] + '-' + r.date[4: 6] + '-' + r.date[6: 8]
    d += ' '+ r.HrMn[: 2] + ':' + r.HrMn[2: 4]
    return d

>>> pd.to_datetime(df[['date', 'HrMn']].apply(to_date_str, axis=1))
0   1990-01-25 12:00:00
dtype: datetime64[ns]

Edit

As @EdChum comments, you can do this even more simply as

pd.to_datetime(df.date.astype(str) + df.HrMn)

which string-concatenates the columns.

Upvotes: 2

Related Questions