Reputation: 983
1. The problem
I have data in a csv file with timestamps. Here is an example of the format of these timestamps:
'Fri Aug 18 2017 00:00:00 GMT+0200 (heure d%u2019été d%u2019Europe centrale)'
Pandas does not seem to be able to process such a string formatting. Using the to_datetime function to check if pandas is able to process it, I have the following error:
In [3]: pd.to_datetime('Fri Aug 18 2017 00:00:00 GMT+0200 (heure d%u2019été d%u2019Europe centrale)')
Out[3]: raise ValueError("Unknown string format:", timestr)
ValueError: ('Unknown string format:', 'Fri Aug 18 2017 00:00:00 GMT+0200 (heure d%u2019été d%u2019Europe centrale)')
2. Logic identified
But if I remove the blabla, it appears to work:
In [4]: pd.to_datetime('Fri Aug 18 2017 00:00:00 GMT+0200')
Out[4]: Timestamp('2017-08-18 00:00:00-0200', tz='pytz.FixedOffset(-120)')
So the solution to have pandas understanding correctly this column as timestamps with time zones is 1st to strip the blabla:
' (heure d%u2019été d%u2019Europe centrale)'
Then to apply the to_datetime function.
3. Implementation
So, I tried to use the following lines of code, and it just is not working.
Dataframe is:
,date
0,Fri Aug 18 2017 00:00:00 GMT+0200 (heure d%u2019été d%u2019Europe centrale)
1,Sat Aug 19 2017 00:00:00 GMT+0200 (heure d%u2019été d%u2019Europe centrale)
Dataframe is stored as 'GC' I tried:
for ts in GC[1][:]:
ts.replace(' (heure d%u2019été d%u2019Europe centrale)')
ts.to_datetime()
I have this output (not very clear to me - I am using spyder, this may be not very helping)
File "pandas\_libs\hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 1
Please, what would be the correct approach? I thank you in advance for your help.
Have a good day,
Bests,
Pierrot
Upvotes: 1
Views: 67
Reputation: 863166
Use Series.str.replace
with column date
and regex=False
for avoid read ()
like regex:
s = df['date'].str.replace(' (heure d%u2019été d%u2019Europe centrale)', '', regex=False)
df['date'] = pd.to_datetime(s)
Or you can split values by \)
with escape )
because regex special character:
df['date'] = pd.to_datetime(df['date'].str.split(' \(').str[0])
Upvotes: 1