Reputation: 510
I have a DF as shown below:
DF =
ID T R
1 A ",Oa+,,Li+,,Wa+"
1 A "Lo+,,Oa+,,Wa+"
1 A ",Li+,,Wa+"
I want to create a new column with values from R
that are at the beginning of a sentence until the delimiter "+" and the characters after the delimiter ",," until the delimiter "+". Meaning:
DF_New =
ID T R Re
1 A ",Oa+,,Li+,,Wa+" Oa,Li,Wa
1 A "Lo+,,Oa+,,Wa+" Lo,Oa,Wa
1 A ",Li+,,Wa+" Li,Wa
I need to change the following line of code to do that:
DF["Re"] = DF["R"].str.split('+').str[0]
Upvotes: 2
Views: 95
Reputation: 59274
If need to strip only +
and ,
(or other specific values), use agg
vals = '+,'
df.R.str.split(',').agg(lambda x: ', '.join(z.strip(vals) for z in x if z.strip(vals)))
0 Oa, Li, Wa
1 Lo, Oa, Wa
2 Li, Wa
Upvotes: 2
Reputation: 51335
Based on your example, You could use str.findall
to find all strings of letters (using the regex \w+
, which matches one or more word characters), and str.join
to join them together:
df['Re'] = df.R.str.findall('(\w+)').str.join(',')
>>> df
ID T R Re
0 1 A ,Oa+,,Li+,,Wa+ Oa,Li,Wa
1 1 A Lo+,,Oa+,,Wa+ Lo,Oa,Wa
2 1 A ,Li+,,Wa+ Li,Wa
Upvotes: 2