Ben G
Ben G

Reputation: 4338

How to split a dataframe column based on a character and retain that character?

I'm having trouble figuring out how to split a dataframe column based on a character and retaining that character string. Here's some example data:

df = pd.DataFrame(
    {"sexage" : ['m45', 'f43']}
)

What I'd like is a separate column with the male/female letter and a separate column with the age.

When I do df['sexage'].str.split('m|f', expand=True), there's no value in the first column. But when I do df['sexage'].str.split('(m|f)', expand=True) I get an extra blank column that I don't want.

I know I can select them by position with df['sexage'].str[0] and df['sexage'].str[1:] but I was wondering if I could do this with regex instead.

Upvotes: 1

Views: 44

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150735

Try extract

df.sexage.str.extract('(\D+)(\d+)')

output:

    0   1
0   m   45
1   f   43

Upvotes: 2

Related Questions