Puggie
Puggie

Reputation: 3987

How to read timezone aware datetimes as a timezone naive local DatetimeIndex with read_csv in pandas?

When I use pandas read_csv to read a column with a timezone aware datetime (and specify this column to be the index), pandas converts it to a timezone naive utc DatetimeIndex.

Data in Test.csv:

DateTime,Temperature 2016-07-01T11:05:07+02:00,21.125 2016-07-01T11:05:09+02:00,21.138 2016-07-01T11:05:10+02:00,21.156 2016-07-01T11:05:11+02:00,21.179 2016-07-01T11:05:12+02:00,21.198 2016-07-01T11:05:13+02:00,21.206 2016-07-01T11:05:14+02:00,21.225 2016-07-01T11:05:15+02:00,21.233

Code to read from csv:

In [1]: import pandas as pd

In [2]: df = pd.read_csv('Test.csv', index_col=0, parse_dates=True)

This results in an index that represents the timezone naive utc time:

In [3]: df.index

Out[3]: DatetimeIndex(['2016-07-01 09:05:07', '2016-07-01 09:05:09',
           '2016-07-01 09:05:10', '2016-07-01 09:05:11',
           '2016-07-01 09:05:12', '2016-07-01 09:05:13',
           '2016-07-01 09:05:14', '2016-07-01 09:05:15'],
          dtype='datetime64[ns]', name='DateTime', freq=None)

I tried to use a date_parser function:

In [4]: date_parser = lambda x: pd.to_datetime(x).tz_localize(None)

In [5]: df = pd.read_csv('Test.csv', index_col=0, parse_dates=True, date_parser=date_parser)

This gave the same result.

How can I make read_csv create a DatetimeIndex that is timezone naive and represents the local time instead of the utc time?

I'm using pandas 0.18.1.

Upvotes: 5

Views: 3040

Answers (4)

Alcofribas Nasier
Alcofribas Nasier

Reputation: 1

You can even try :

date_parser = lambda x : pd.to_datetime(x.str[:-6])

Upvotes: -1

patricktokeeffe
patricktokeeffe

Reputation: 1108

I adopted the dateutil technique earlier today but have since switched to a faster alternative:

date_parser = lambda ts: pd.to_datetime([s[:-5] for s in ts]))

Edit: s[:-5] is correct (screenshot has error)

In the screenshot below, I import ~55MB of tab-separated files. The dateutil method works, but takes orders of magnitude longer.

enter image description here

This was using pandas 0.18.1 and dateutil 2.5.3.


EDIT This lambda function will work even if Z-0000 suffix is missing...

date_parser = lambda ts: pd.to_datetime([s[:-5] if 'Z' in s else s for s in ts])

Upvotes: 1

Puggie
Puggie

Reputation: 3987

The answer of Alex leads to a timezone aware DatetimeIndex. To get a timezone naive local DatetimeIndex, as asked by the OP, inform dateutil.parser.parser to ignore the timezone information by setting ignoretz=True:

import dateutil

date_parser = lambda x: dateutil.parser.parse(x, ignoretz=True)
df = pd.read_csv('Test.csv', index_col=0, parse_dates=True, date_parser=date_parser)

print(df)

outputs

                     Temperature
DateTime                        
2016-07-01 11:05:07       21.125
2016-07-01 11:05:09       21.138
2016-07-01 11:05:10       21.156
2016-07-01 11:05:11       21.179
2016-07-01 11:05:12       21.198
2016-07-01 11:05:13       21.206
2016-07-01 11:05:14       21.225
2016-07-01 11:05:15       21.233

Upvotes: 4

Alex
Alex

Reputation: 19104

According to the docs the default date_parser uses dateutil.parser.parser. According to the docs for that function, the default is to ignore timezones. So if you supply dateutil.parser.parser as the date_parser kwarg, timezones are not converted.

import dateutil

df = pd.read_csv('Test.csv', index_col=0, parse_dates=True, date_parser=dateutil.parser.parse)

print(df)

outputs

                           Temperature
DateTime                              
2016-07-01 11:05:07+02:00       21.125
2016-07-01 11:05:09+02:00       21.138
2016-07-01 11:05:10+02:00       21.156
2016-07-01 11:05:11+02:00       21.179
2016-07-01 11:05:12+02:00       21.198
2016-07-01 11:05:13+02:00       21.206
2016-07-01 11:05:14+02:00       21.225
2016-07-01 11:05:15+02:00       21.233

Upvotes: 4

Related Questions