Reputation: 26329

Pandas converting row with unix timestamp (in milliseconds) to datetime

I need to process a huge amount of CSV files where the time stamp is always a string representing the unix timestamp in milliseconds. I could not find a method yet to modify these columns efficiently.

This is what I came up with, however this of course duplicates only the column and I have to somehow put it back to the original dataset. I'm sure it can be done when creating the DataFrame?

import sys
if sys.version_info[0] < 3:
    from StringIO import StringIO
else:
    from io import StringIO
import pandas as pd

data = 'RUN,UNIXTIME,VALUE\n1,1447160702320,10\n2,1447160702364,20\n3,1447160722364,42'

df = pd.read_csv(StringIO(data))

convert = lambda x: datetime.datetime.fromtimestamp(x / 1e3)
converted_df = df['UNIXTIME'].apply(convert)

This will pick the column 'UNIXTIME' and change it from

0    1447160702320
1    1447160702364
2    1447160722364
Name: UNIXTIME, dtype: int64

into this

0   2015-11-10 14:05:02.320
1   2015-11-10 14:05:02.364
2   2015-11-10 14:05:22.364
Name: UNIXTIME, dtype: datetime64[ns]

However, I would like to use something like pd.apply() to get the whole dataset returned with the converted column or as I already wrote, simply create datetimes when generating the DataFrame from CSV.

Upvotes: 65

Answers (4)

Teudimundo

Reputation: 2670

I use the @EdChum solution, but I add the timezone management:

df['UNIXTIME']=pd.DatetimeIndex(pd.to_datetime(pd['UNIXTIME'], unit='ms'))\
                 .tz_localize('UTC' )\
                 .tz_convert('America/New_York')

the tz_localize indicates that timestamp should be considered as regarding 'UTC', then the tz_convert actually moves the date/time to the correct timezone (in this case `America/New_York').

Note that it has been converted to a DatetimeIndex because the tz_ methods works only on the index of the series. Since Pandas 0.15 one can use .dt:

df['UNIXTIME']=pd.to_datetime(df['UNIXTIME'], unit='ms')\
                 .dt.tz_localize('UTC' )\
                 .dt.tz_convert('America/New_York')

Upvotes: 14

cs95

Reputation: 403278

if you know the timestamp unit, use Series.astype:

df['UNIXTIME'].astype('datetime64[ms]')

0   2015-11-10 13:05:02.320
1   2015-11-10 13:05:02.364
2   2015-11-10 13:05:22.364
Name: UNIXTIME, dtype: datetime64[ns]

To return the entire DataFrame, use

df.astype({'UNIXTIME': 'datetime64[ms]'})

   RUN                UNIXTIME  VALUE
0    1 2015-11-10 13:05:02.320     10
1    2 2015-11-10 13:05:02.364     20
2    3 2015-11-10 13:05:22.364     42

Upvotes: 4

EdChum

Reputation: 394469

You can do this as a post processing step using to_datetime and passing arg unit='ms':

In [5]:
df['UNIXTIME'] = pd.to_datetime(df['UNIXTIME'], unit='ms')
df

Out[5]:
   RUN                UNIXTIME  VALUE
0    1 2015-11-10 13:05:02.320     10
1    2 2015-11-10 13:05:02.364     20
2    3 2015-11-10 13:05:22.364     42

Upvotes: 111

tamasgal

Reputation: 26329

I came up with a solution I guess:

convert = lambda x: datetime.datetime.fromtimestamp(float(x) / 1e3)

df = pd.read_csv(StringIO(data), parse_dates=['UNIXTIME'], date_parser=convert)

I'm still not sure if this is the best one though.

Upvotes: 3

Pandas converting row with unix timestamp (in milliseconds) to datetime

Answers (4)

Related Questions