sparrow
sparrow

Reputation: 11460

How to split a column into two separate ones in a DataFrame with Pandas

How can I split the a column into two separate ones. Would apply be the way to go about this? I want to keep the other columns in the DataFrame.

For example I have a column called "last_created" with a bunch of dates and times: "2016-07-01 09:50:09"

I want to create two new columns "date" and "time" with the split values.

This is what I tried but it's returning an error. For some reason my data was getting converted from str to float so I forced it to str.

def splitter(row):
    row = str(row)
    return row.split()

df['date'],df['time'] = df['last_created'].apply(splitter)

Error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-47-e5a9cf968714> in <module>()
      7     return row.split()
      8 
----> 9 df['date'],df['time'] = df['last_created'].apply(splitter)
     10 df
     11 #splitter(df.iloc[1,1])

ValueError: too many values to unpack (expected 2)

Upvotes: 1

Views: 667

Answers (3)

jezrael
jezrael

Reputation: 862661

You can first convert to_datetime if dtype is object and then use dt.date and dt.time:

df = pd.DataFrame({'last_created':['2016-07-01 09:50:09', '2016-07-01 09:50:09']})
print (df)
          last_created
0  2016-07-01 09:50:09
1  2016-07-01 09:50:09

print (df.dtypes)
last_created    object
dtype: object

df['last_created'] = pd.to_datetime(df.last_created)

print (df.dtypes)
last_created    datetime64[ns]
dtype: object

df['date'], df['time'] = df.last_created.dt.date, df.last_created.dt.time
print (df)
         last_created        date      time
0 2016-07-01 09:50:09  2016-07-01  09:50:09
1 2016-07-01 09:50:09  2016-07-01  09:50:09

Upvotes: 1

user3404344
user3404344

Reputation: 1727

The following should work for you. However, storing the date and time as timestamp is much convenient for manipulation.

df['date'] = [d.split()[0] for d in df['last_created']]
df['time'] = [d.split()[1] for d in df['last_created']]

Upvotes: 1

spritecodej
spritecodej

Reputation: 459

In my cases, I just using the function. ipython source code is below.

In [5]: df = dict(data="", time="", last_created="")

In [6]: df
Out[6]: {'data': '', 'last_created': '', 'time': ''}

In [7]: df["last_created"] = "2016-07-01 09:50:09"

In [8]: df
Out[8]: {'data': '', 'last_created': '2016-07-01 09:50:09', 'time': ''}

In [9]: def splitter(row):
   ...:     row = str(row)
   ...:     return row.split()

In [10]: df["data"], df["time"] = splitter(df["last_created"])

In [11]: df
Out[11]:
{'data': '2016-07-01',
 'last_created': '2016-07-01 09:50:09',
 'time': '09:50:09'}

Upvotes: 1

Related Questions