Reputation: 2610
I have data where one column, say column 0, contains strings like '%Y-%m-%d %H:%M:%S' and another column, say column 1, containing integers with the hundreds of seconds. I want to read both columns together into the index of the data frame. Using
parse = lambda d: dt.datetime.strptime(d,'%Y-%m-%d %H:%M:%S %f')
df = pd.read_csv(myFile, sep=';', index_col=0, parse_dates=[[0,1]], \
keep_date_col=True, date_parser=parse)
however, will treat all integers 1,2,...9 as if they would denote 10,20,...90 hundreds of a second. E.g. '2013-3-27 09:00:01' in column 0 and 9 in column 1 would be translatetd into Timestamp('2013-03-27 09:00:01.900000', tz=None), instead of Timestamp('2013-03-27 09:00:01.090000', tz=None)
I guess that the date_parser function takes the 9 as '9', but would need to interpreted it as '09'. How can I solve this?
EDIT:
df = pd.read_csv(myFile, sep=';')
# with column 'TIMESTAMP' containing the strings and column 'HSEC' containing \
# the ints with the hundreds of seconds
df['newTimestamp'] = pd.to_datetime(df['TIMESTAMP'],format='%Y-%m-%d %H:%M:%S').add(pd.to_timedelta(dataOB['HSEC']*10000000)
dataOB.set_index('new',inplace=True)
dataOB.sort_index(inplace=True)
(somehow, the solution presents itself often only after I post my question here, despite hours of looking for it before approaching the forum. Still, hope its useful to someone else, too.)
Upvotes: 1
Views: 760
Reputation: 52276
Some dummy data
df = pd.read_csv(StringIO("""col1;col2;col3
2014-07-16 14:23:46;1;12
2014-07-16 14:23:53;5;12
2014-07-16 14:23:55;10;12
2014-07-16 14:23:59;15;12
2014-07-16 14:23:59;20;12
2014-07-16 14:24:00;25;12"""), sep=';')
Rather than handle it all in the read_csv
step, it might be easier to read the data first, then combine the columns, like this?
df['date'] = df['col1'] + '.' + df['col2'].apply(lambda x: str(x).zfill(2))
Then you can pass the combined column to pd.to_datetime
and set your index.
df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date')
Upvotes: 1