Randy Morrison
Randy Morrison

Reputation: 95

Split a DataFrame column applying a function Python

I have a data frame (see example df) and I need to split the column into 2 (see example df_exp).

import pandas as pd 
#given df 
df = pd.DataFrame(np.array([["Joe", 25, "40 RF"], ["Sam", 5, "RM"], ["Roy", 8, "50 SD"]]),columns=[0, 1, 2])
#expected df 
df_exp = pd.DataFrame(np.array([["Joe", 25, "40 RF", 40, "RF"], ["Sam", 5, "RM", None, "RM"], ["Roy", 8, "50 SD", 50, "SD"]]),columns=[0, 1, 2, 2.1, 2.2])

I have the following function:

def split_string(string):
    if string[0].isnumeric()==True:
        sep = string.split(" ",1)
        return sep[0], sep[1]
    else:
        return None, string

I tried to apply it, but got an error, what is the best way to split a column using a function?

df[[21, 2.2]] = df.apply(lambda x: split_string(df.ix[:, 2]), axis = 1)

Upvotes: 1

Views: 135

Answers (1)

Corralien
Corralien

Reputation: 120479

import re

def split_string(string):
    return re.search('(\d+)?\s*(\w+)?', string).groups()
>>> df[2].apply(split_string).apply(pd.Series)
      0   1
0    40  RF
1  None  RM
2    50  SD

Old answer:
You can use extract to accomplish what you want:

>>> df[2].str.extract(r'(\d+)?\s*(\w+)?')
     0   1
0   40  RF
1  NaN  RM
2   50  SD

Upvotes: 1

Related Questions