Mi.
Mi.

Reputation: 510

Splitting String in a Column multiple times

I have a DF as shown below:

DF =

ID   T    R
1    A    ",Oa+,,Li+,,Wa+"
1    A    "Lo+,,Oa+,,Wa+"
1    A    ",Li+,,Wa+"

I want to create a new column with values from R that are at the beginning of a sentence until the delimiter "+" and the characters after the delimiter ",," until the delimiter "+". Meaning:

DF_New =

ID   T    R                     Re
1    A    ",Oa+,,Li+,,Wa+"      Oa,Li,Wa
1    A    "Lo+,,Oa+,,Wa+"       Lo,Oa,Wa
1    A    ",Li+,,Wa+"           Li,Wa

I need to change the following line of code to do that:

DF["Re"] = DF["R"].str.split('+').str[0]

Upvotes: 2

Views: 95

Answers (2)

rafaelc
rafaelc

Reputation: 59274

If need to strip only + and , (or other specific values), use agg

vals = '+,'
df.R.str.split(',').agg(lambda x: ', '.join(z.strip(vals) for z in x if z.strip(vals)))

0    Oa, Li, Wa
1    Lo, Oa, Wa
2        Li, Wa

Upvotes: 2

sacuL
sacuL

Reputation: 51335

Based on your example, You could use str.findall to find all strings of letters (using the regex \w+, which matches one or more word characters), and str.join to join them together:

df['Re'] = df.R.str.findall('(\w+)').str.join(',')

>>> df
   ID  T               R        Re
0   1  A  ,Oa+,,Li+,,Wa+  Oa,Li,Wa
1   1  A   Lo+,,Oa+,,Wa+  Lo,Oa,Wa
2   1  A       ,Li+,,Wa+     Li,Wa

Upvotes: 2

Related Questions