Reputation: 434
I have a Pandas DataFrame called ebola
as seen below. variable
column has two pieces of information status
whether it is Cases or Deaths and country
which consists of country names. I try to create two new columns status
and country
out of that variable
column by using .apply()
function. However, since there are two values I am trying to extract, it does not work.
# let's create a splitter function
def splitter(column):
status, country = column.split("_")
return status, country
# apply this function to that column and assign to two new columns
ebola[['status', 'country']] = ebola['variable'].apply(splitter)
The error I get is
ValueError: Must have equal len keys and value when setting with an iterable
I want my output to be like this
Upvotes: 0
Views: 111
Reputation: 434
This is very late post to original question. Thanks to @ansev, the solution was great and it worked out great. While I was going through my question, I was trying to develop a solution based on my first approach. I was able to work it out and I wanted to share for anyone who might want to see a different perspective on this.
update to my code:
# let's create a splitter function
def splitter(column):
for row in column:
status, country = row.split("_")
return status, country
# apply this function to that column and assign to two new columns
ebola[['status', 'country']] = ebola['variable'].to_frame().apply(splitter, axis=1, result_type='expand')
Two updates to my code, so it could work.
.to_frame()
method.splitter
function, I had to iterate through each row since it was a DataFrame. Therefore, I added for row in column
line.To replicate all of this:
import numpy as np
import pandas as pd
# create the data
ebola_dict = {'Date':['3/24/2014', '3/22/2014', '1/15/2015', '1/4/2015'],
'variable': ['Cases_Guinea', 'Cases_Guinea', 'Cases_Liberia', 'Cases_Liberia']}
ebola = pd.DataFrame(ebola_dict)
print(ebola)
# let's create a splitter function
def splitter(column):
for row in column:
status, country = row.split("_")
return status, country
# apply this function to that column and assign to two new columns
ebola[['status', 'country']] = ebola['variable'].to_frame().apply(splitter, axis=1, result_type='expand')
# check if it worked
print(ebola)
Upvotes: 0
Reputation: 30930
Use Series.str.split
ebola[['status','country']]=ebola['variable'].str.split(pat='_',expand=True)
Upvotes: 1