Reputation: 1059
Assuming I have the following toy model df
:
Line Sentence
1 A MAN TAUGHT ME HOW TO DANCE.
2 WE HAVE TO CHOOSE A CAKE.
3 X RAYS CAN BE HARMFUL.
4 MY HERO IS MALCOLM X FROM THE USA.
5 THE BEST ACTOR IS JENNIFER A FULTON.
6 A SOUND THAT HAS A BIG IMPACT.
If I were to do the following:
df['Sentence'] = df['Sentence'].str.replace('A ',' ')
This would remove all characters 'A '
from all sentences. However, I only need the 'A '
removed from string sentences that start with 'A '
. Similarly, I would like to remove the 'X '
from Line 3, and not from Malcolm X in Line 4.
The final output df should look like the following:
Line Sentence
1 MAN TAUGHT ME HOW TO DANCE.
2 WE HAVE TO CHOOSE A CAKE.
3 RAYS CAN BE HARMFUL.
4 MY HERO IS MALCOLM X FROM THE USA.
5 THE BEST ACTOR IS JENNIFER A FULTON.
6 SOUND THAT HAS A BIG IMPACT.
Upvotes: 1
Views: 148
Reputation: 26676
str.replace, startofstring,value, space. Code below
df.Sentence==df.Sentence.str.replace('^A\s+|^X\s+', '')
Sentence
0 MAN TAUGHT ME HOW TO DANCE.
1 WE HAVE TO CHOOSE A CAKE.
2 RAYS CAN BE HARMFUL.
3 MY HERO IS MALCOLM X FROM THE USA.
4 HE BEST ACTOR IS JENNIFER A FULTON.
5 SOUND THAT HAS A BIG IMPACT.
Upvotes: 2
Reputation: 195438
You can use regular expression:
df["Sentence"] = df["Sentence"].str.replace(r"^(?:A|X)(?=\s)", "", regex=True)
print(df)
Prints:
Line Sentence
0 1 MAN TAUGHT ME HOW TO DANCE.
1 2 WE HAVE TO CHOOSE A CAKE.
2 3 RAYS CAN BE HARMFUL.
3 4 MY HERO IS MALCOLM X FROM THE USA.
4 5 THE BEST ACTOR IS JENNIFER A FULTON.
5 6 SOUND THAT HAS A BIG IMPACT.
Upvotes: 3
Reputation: 35636
Use Regex to match only start of strings:
df['Sentence'] = df['Sentence'].str.replace(r'^([AX] )', '', regex=True)
df
:
Line Sentence
0 1 MAN TAUGHT ME HOW TO DANCE.
1 2 WE HAVE TO CHOOSE A CAKE.
2 3 RAYS CAN BE HARMFUL.
3 4 MY HERO IS MALCOLM X FROM THE USA.
4 5 THE BEST ACTOR IS JENNIFER A FULTON.
5 6 SOUND THAT HAS A BIG IMPACT.
Upvotes: 2