Karan Kundra
Karan Kundra

Reputation: 71

Split a dataframe about a column of a character in a special manner

I'm very new to pandas, want some guidance from you smart folks.

df.head():

feature_category
transgender_gender
725-750_crif_score
<25_age
<575_crif_score 

I want to make a separate column containing the string after the first under score.

df:

feature_category.             feature_name
transgender_gender              gender 
725-750_crif_score              crif_score
<25_age                         age
<575_crif_score                 crif_score      

Please guide to achieve the desired results.

Upvotes: 1

Views: 31

Answers (2)

Corralien
Corralien

Reputation: 120409

Use str.extract:

df['feature_name'] = df['feature_category'].str.extract('_(.*)')
print(df)

# Output
     feature_category feature_name
0  transgender_gender       gender
1  725-750_crif_score   crif_score
2             <25_age          age
3     <575_crif_score   crif_score

_(.*) extract all characters after the first underscore.

Upvotes: 1

user7864386
user7864386

Reputation:

You could use str.split method and setting parameter n=1, which limits the number of splits to 1. Then use the str accessor to select the second part:

df['feature_name'] = df['feature_category'].str.split('_', 1).str[1]

Output:

     feature_category feature_name
0  transgender_gender       gender
1  725-750_crif_score   crif_score
2             <25_age          age
3     <575_crif_score   crif_score

Upvotes: 1

Related Questions