Reputation: 1059

Replace a word or set of letters from a string in a dataframe only if the string starts with that word

Assuming I have the following toy model df:

Line          Sentence

1             A MAN TAUGHT ME HOW TO DANCE.
2             WE HAVE TO CHOOSE A CAKE. 
3             X RAYS CAN BE HARMFUL.
4             MY HERO IS MALCOLM X FROM THE USA.
5             THE BEST ACTOR IS JENNIFER A FULTON. 
6             A SOUND THAT HAS A BIG IMPACT.

If I were to do the following:

df['Sentence'] = df['Sentence'].str.replace('A ',' ')

This would remove all characters 'A ' from all sentences. However, I only need the 'A ' removed from string sentences that start with 'A '. Similarly, I would like to remove the 'X ' from Line 3, and not from Malcolm X in Line 4.

The final output df should look like the following:

Line          Sentence

1             MAN TAUGHT ME HOW TO DANCE.
2             WE HAVE TO CHOOSE A CAKE. 
3             RAYS CAN BE HARMFUL.
4             MY HERO IS MALCOLM X FROM THE USA.
5             THE BEST ACTOR IS JENNIFER A FULTON. 
6             SOUND THAT HAS A BIG IMPACT.

Upvotes: 1

Answers (3)

wwnde

Reputation: 26676

str.replace, startofstring,value, space. Code below

df.Sentence==df.Sentence.str.replace('^A\s+|^X\s+', '')
       
          

                   Sentence
0          MAN TAUGHT ME HOW TO DANCE.
1            WE HAVE TO CHOOSE A CAKE.
2                 RAYS CAN BE HARMFUL.
3   MY HERO IS MALCOLM X FROM THE USA.
4  HE BEST ACTOR IS JENNIFER A FULTON.
5         SOUND THAT HAS A BIG IMPACT.

Upvotes: 2

Andrej Kesely

Reputation: 195438

You can use regular expression:


df["Sentence"] = df["Sentence"].str.replace(r"^(?:A|X)(?=\s)", "", regex=True)
print(df)

Prints:

   Line                              Sentence
0     1           MAN TAUGHT ME HOW TO DANCE.
1     2             WE HAVE TO CHOOSE A CAKE.
2     3                  RAYS CAN BE HARMFUL.
3     4    MY HERO IS MALCOLM X FROM THE USA.
4     5  THE BEST ACTOR IS JENNIFER A FULTON.
5     6          SOUND THAT HAS A BIG IMPACT.

Upvotes: 3

Henry Ecker

Reputation: 35636

Use Regex to match only start of strings:

df['Sentence'] = df['Sentence'].str.replace(r'^([AX] )', '', regex=True)

df:

   Line                              Sentence
0     1           MAN TAUGHT ME HOW TO DANCE.
1     2             WE HAVE TO CHOOSE A CAKE.
2     3                  RAYS CAN BE HARMFUL.
3     4    MY HERO IS MALCOLM X FROM THE USA.
4     5  THE BEST ACTOR IS JENNIFER A FULTON.
5     6         SOUND THAT HAS A BIG IMPACT.

Upvotes: 2

Replace a word or set of letters from a string in a dataframe only if the string starts with that word

Answers (3)

Related Questions