Reputation: 73
I am working on a dataframe with one of the columns with values like this -
field |
---|
marketable_email_status_m10 |
email_availability_status_m11 |
ending_ar_60_to_89_dpd_m11 |
email_availability_status_m1 |
I want my final output such that the string is split in two columns as below:
field | text1 | text2 |
---|---|---|
marketable_email_status_m10 | marketable_email_status | m10 |
email_availability_status_m11 | email_availability_status | m11 |
ending_ar_60_to_89_dpd_m11 | ending_ar_60_to_89_dpd | m11 |
email_availability_status_m1 | email_availability_status | m1 |
I have been able to yield column 3, but not sure how to go about column 2.
Upvotes: 3
Views: 65
Reputation: 862611
Use Series.str.rsplit
with n=1
for split by last _
:
df[['text1','text2']] = df['field'].str.rsplit('_', n=1, expand=True)
print (df)
field text1 text2
0 marketable_email_status_m10 marketable_email_status m10
1 email_availability_status_m11 email_availability_status m11
2 ending_ar_60_to_89_dpd_m11 ending_ar_60_to_89_dpd m11
3 email_availability_status_m1 email_availability_status m1
Upvotes: 2
Reputation: 133508
With extract
function please try following.
df[["text1","text2"]] = df['field'].str.extract(r'^(.*)_(.*)$')
Explanation:
df.str.extract
function on DataFrame's field
column.text1
and text2
._
and 2nd one has rest of the value(as per OP's requirement).text1
and text2
.Output will be as follows:
field text1 text2
0 marketable_email_status_m10 marketable_email_status m10
1 email_availability_status_m11 email_availability_status m11
2 ending_ar_60_to_89_dpd_m11 ending_ar_60_to_89_dpd m11
3 email_availability_status_m1 email_availability_status m1
Upvotes: 3