Reputation: 23
I am trying to combine date and time columns of a csv file and convert them to timestamp using pandas.
Here is a sample of my csv file when read into a dataframe
Id Station Month Parameter Date From To
1.0 ANANDVIHAR Dec ?PM2.5 2015-12-01 ?00:00:00 ?00:59:00
The Following Code:-
df['DateTime'] = df.apply(lambda row: datetime.datetime.strptime(row['Date']+ ':' + row['From'], '%Y.%m.%d:%H:%M:%S'), axis=1)
Is giving the following error:-
Traceback (most recent call last):
File "project101.py", line 36, in <module>
df['DateTime'] = df.apply(lambda row: datetime.datetime.strptime(row['Date']+ ':' + row['From'], '%Y.%m.%d:%H:%M:%S'), axis=1)
File "c:\Python27\lib\site-packages\pandas\core\frame.py", line 4133, in apply
return self._apply_standard(f, axis, reduce=reduce)
File "c:\Python27\lib\site-packages\pandas\core\frame.py", line 4229, in _apply_standard
results[i] = func(v)
File "project101.py", line 36, in <lambda>
df['DateTime'] = df.apply(lambda row: datetime.datetime.strptime(row['Date']+ ':' + row['From'], '%Y.%m.%d:%H:%M:%S'), axis=1)
File "c:\Python27\lib\_strptime.py", line 332, in _strptime
(data_string, format))
ValueError: ("time data '2015-12-01:\\xa000:00:00' does not match format '%Y.%m.%d:%H:%M:%S'", u'occurred at index 0')
Upvotes: 2
Views: 132
Reputation: 23
I finally got a solution I stripped the question marks before the date column and applied to_datetime() to the column of the dataframe
df['From'] = df['From'].map(lambda x: str(x)[1:]) df['FromTime'] = pd.to_datetime(df['Date'].str.cat(df['From'], sep=" "),format='%Y-%m-%d %H:%M:%S', errors='coerce')
Upvotes: 0
Reputation: 8703
You can simply do:
df['DateTime'] = pd.to_datetime(df['Date'].str.cat(df['From'], sep=" "),
format='%Y-%m-%d \\xa%H:%M:%S', errors='coerce')
The '\\xa'
in the format specifier will take care of the question marks. Those marks are for misinterpreted literal, which probably looks like '\\xa'
Upvotes: 1
Reputation: 195
You can use pandas.Series.str.cat function.
Following code gives you a basic idea about this:
>>> Series(['a', 'b', 'c']).str.cat(['A', 'B', 'C'], sep=',')
0 a,A
1 b,B
2 c,C
dtype: object
For more information, please check this:
http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.Series.str.cat.html
Hope this solves your problem...
Upvotes: 0