Niteesh
Niteesh

Reputation: 23

Combining columns using pandas

I am trying to combine date and time columns of a csv file and convert them to timestamp using pandas.

Here is a sample of my csv file when read into a dataframe

Dataframe after reading

Id     Station        Month       Parameter    Date        From       To
1.0    ANANDVIHAR     Dec         ?PM2.5       2015-12-01  ?00:00:00  ?00:59:00

The Following Code:-

df['DateTime'] = df.apply(lambda row: datetime.datetime.strptime(row['Date']+ ':' + row['From'], '%Y.%m.%d:%H:%M:%S'), axis=1)

Is giving the following error:-

Traceback (most recent call last):

  File "project101.py", line 36, in <module>
    df['DateTime'] = df.apply(lambda row: datetime.datetime.strptime(row['Date']+ ':' + row['From'], '%Y.%m.%d:%H:%M:%S'), axis=1)

  File "c:\Python27\lib\site-packages\pandas\core\frame.py", line 4133, in apply
    return self._apply_standard(f, axis, reduce=reduce)

 File "c:\Python27\lib\site-packages\pandas\core\frame.py", line 4229, in _apply_standard
    results[i] = func(v)

  File "project101.py", line 36, in <lambda>
    df['DateTime'] = df.apply(lambda row: datetime.datetime.strptime(row['Date']+ ':' + row['From'], '%Y.%m.%d:%H:%M:%S'), axis=1)

  File "c:\Python27\lib\_strptime.py", line 332, in _strptime
    (data_string, format))

ValueError: ("time data '2015-12-01:\\xa000:00:00' does not match format '%Y.%m.%d:%H:%M:%S'", u'occurred at index 0')

Upvotes: 2

Views: 132

Answers (3)

Niteesh
Niteesh

Reputation: 23

I finally got a solution I stripped the question marks before the date column and applied to_datetime() to the column of the dataframe

df['From'] = df['From'].map(lambda x: str(x)[1:])
df['FromTime'] = pd.to_datetime(df['Date'].str.cat(df['From'], sep=" "),format='%Y-%m-%d %H:%M:%S', errors='coerce')

Upvotes: 0

Kartik
Kartik

Reputation: 8703

You can simply do:

df['DateTime'] = pd.to_datetime(df['Date'].str.cat(df['From'], sep=" "),
                                format='%Y-%m-%d \\xa%H:%M:%S', errors='coerce')

The '\\xa' in the format specifier will take care of the question marks. Those marks are for misinterpreted literal, which probably looks like '\\xa'

Upvotes: 1

sansingh
sansingh

Reputation: 195

You can use pandas.Series.str.cat function.

Following code gives you a basic idea about this:

>>> Series(['a', 'b', 'c']).str.cat(['A', 'B', 'C'], sep=',')
0    a,A
1    b,B
2    c,C
dtype: object

For more information, please check this:

http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.Series.str.cat.html

Hope this solves your problem...

Upvotes: 0

Related Questions