Reputation: 2689
I have a dataframe with a column containing hundreds of rows of strings such as:
nh sh sl hhlh lsl s h h lhlll hh l sh hl sl l shhllh sl h shhl hhl ll s s lhhlh lhl sl s sh l shhlll h hl hhl sllh ll s hh sl hhlh sl s sl l hl hhl lhhllh sl nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll
n s s s s s s s s s h s sl sl s s sh sl s nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll
nhlhh n sh sll hh shl lhh s s hh sl hl hhlh lhhl sl lh s slhllhs lh s sh sl h shhl sl sl hhl h sh slsll hh lhh hlll hhl ll hhs s s sll hs lh hsl hll h s sl hh s s lhhlll lhl hl hhs hhhlll hhl hl hhs hlllh hs sh sl hll hh shhlh ll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll nsshll
I want to remove the nsshll
that is appended to every row. For example, the above three rows would become:
nh sh sl hhlh lsl s h h lhlll hh l sh hl sl l shhllh sl h shhl hhl ll s s lhhlh lhl sl s sh l shhlll h hl hhl sllh ll s hh sl hhlh sl s sl l hl hhl lhhllh sl
n s s s s s s s s s h s sl sl s s sh sl s
nhlhh n sh sll hh shl lhh s s hh sl hl hhlh lhhl sl lh s slhllhs lh s sh sl h shhl sl sl hhl h sh slsll hh lhh hlll hhl ll hhs s s sll hs lh hsl hll h s sl hh s s lhhlll lhl hl hhs hhhlll hhl hl hhs hlllh hs sh sl hll hh shhlh ll
I've tried to remove them using rstrip
nhl_pred['nhl-predicted'] = nhl_pred['nhl-predicted'].str.rstrip(' nsshll')
but that clears out the entire string and returns and empty column.
I then tried with regex
nhl_pred['nhl-predicted'] = nhl_pred['nhl-predicted'].str.replace(r' nsshll$', '')
But this either does nothing or removes only the very last substring while leaving the rest.
How would I achieve my desired result?
Thanks
Upvotes: 1
Views: 47
Reputation: 51683
When using str.rstrip(' nsshll')
you provide a set of characters to remove - not an string - that is why all your content gets deleted.
You can use regex and specify a amount of +
(1 or more ocurences) for your pattern (that you put into a non-capturing group (?: ....)
to effect it as whole pattern and not just apply +
to the last 'l'
):
str.replace(r'(?: nsshll)+$', '')
Upvotes: 2