Reputation: 95
I have a data frame (see example df) and I need to split the column into 2 (see example df_exp).
import pandas as pd
#given df
df = pd.DataFrame(np.array([["Joe", 25, "40 RF"], ["Sam", 5, "RM"], ["Roy", 8, "50 SD"]]),columns=[0, 1, 2])
#expected df
df_exp = pd.DataFrame(np.array([["Joe", 25, "40 RF", 40, "RF"], ["Sam", 5, "RM", None, "RM"], ["Roy", 8, "50 SD", 50, "SD"]]),columns=[0, 1, 2, 2.1, 2.2])
I have the following function:
def split_string(string):
if string[0].isnumeric()==True:
sep = string.split(" ",1)
return sep[0], sep[1]
else:
return None, string
I tried to apply it, but got an error, what is the best way to split a column using a function?
df[[21, 2.2]] = df.apply(lambda x: split_string(df.ix[:, 2]), axis = 1)
Upvotes: 1
Views: 135
Reputation: 120479
import re
def split_string(string):
return re.search('(\d+)?\s*(\w+)?', string).groups()
>>> df[2].apply(split_string).apply(pd.Series)
0 1
0 40 RF
1 None RM
2 50 SD
Old answer:
You can use extract
to accomplish what you want:
>>> df[2].str.extract(r'(\d+)?\s*(\w+)?')
0 1
0 40 RF
1 NaN RM
2 50 SD
Upvotes: 1