user3820991
user3820991

Reputation: 2610

pandas - parsing column containing strings and column containing int into datetime

I have data where one column, say column 0, contains strings like '%Y-%m-%d %H:%M:%S' and another column, say column 1, containing integers with the hundreds of seconds. I want to read both columns together into the index of the data frame. Using

parse = lambda d: dt.datetime.strptime(d,'%Y-%m-%d %H:%M:%S %f')
df = pd.read_csv(myFile, sep=';', index_col=0, parse_dates=[[0,1]], \
                 keep_date_col=True, date_parser=parse)

however, will treat all integers 1,2,...9 as if they would denote 10,20,...90 hundreds of a second. E.g. '2013-3-27 09:00:01' in column 0 and 9 in column 1 would be translatetd into Timestamp('2013-03-27 09:00:01.900000', tz=None), instead of Timestamp('2013-03-27 09:00:01.090000', tz=None)

I guess that the date_parser function takes the 9 as '9', but would need to interpreted it as '09'. How can I solve this?

EDIT:

df = pd.read_csv(myFile, sep=';') 
# with column 'TIMESTAMP' containing the strings and column 'HSEC' containing \
# the ints with the hundreds of seconds

df['newTimestamp'] = pd.to_datetime(df['TIMESTAMP'],format='%Y-%m-%d %H:%M:%S').add(pd.to_timedelta(dataOB['HSEC']*10000000) 
dataOB.set_index('new',inplace=True)
dataOB.sort_index(inplace=True)

(somehow, the solution presents itself often only after I post my question here, despite hours of looking for it before approaching the forum. Still, hope its useful to someone else, too.)

Upvotes: 1

Views: 760

Answers (1)

chrisb
chrisb

Reputation: 52276

Some dummy data

df = pd.read_csv(StringIO("""col1;col2;col3
    2014-07-16 14:23:46;1;12
    2014-07-16 14:23:53;5;12
    2014-07-16 14:23:55;10;12
    2014-07-16 14:23:59;15;12
    2014-07-16 14:23:59;20;12
    2014-07-16 14:24:00;25;12"""), sep=';')

Rather than handle it all in the read_csv step, it might be easier to read the data first, then combine the columns, like this?

df['date'] = df['col1'] + '.' +  df['col2'].apply(lambda x: str(x).zfill(2))

Then you can pass the combined column to pd.to_datetime and set your index.

df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date')

Upvotes: 1

Related Questions