Reputation: 27
I have a data frame that consists of multiple rows that contain different variations of a string that is separated by commas. Rather than constantly writing variations of this code such as df.replace('Word,', '')
, I am looking for a simpler way to replace variations in strings for python. I have heard about regex yet am having a difficult time understanding it.
One such example that I am looking into is df.column.str.replace('Word,?', '')
which would replace all variations of "Word" regardless of comma position. However, I am unsure as to how this works. Any help in understanding replacing using regex would be greatly appreciated. Thank you in advance.
Example:
'Word, foo, bar'
'Word'
'foo, bar, Word'
'foo, Word, bar'
Desired Output:
'foo, bar'
''
'foo, bar'
'foo, bar'
Upvotes: 1
Views: 598
Reputation: 21
df.replace(to_replace='Word,|(, )?Word',value='',regex=True)
This way .replace()
method will do the required work.
to_replace
is our regular expression criteria and it should be in string.
'Word,'
will match all strings except at the end in form of ", Word"
.
To match those end string we provided "|"(or)
so that we can add new criteria which is "(, )?Word"
. Here ?
match 0 or 1 occurrence of ", "
(comma and 1 space) so that both conditions for ending string as well as only 1 string "Word"
matched
Value = ''
: which show what to be replaced with
regex = True
: which tells to treat "to_replace"
parameter as a regex expression
Upvotes: 1
Reputation: 8033
You can do it as below Input
df = pd.DataFrame([[1, 'Word, foo, bar'],
[2, 'Word'],
[3, 'foo, bar, Word'],
[4, 'foo, Word, bar']],columns=['id', 'text'])
id text
1 Word, foo, bar
2 Word
3 foo, bar, Word
4 foo, Word, bar
Code to replace text 'Word' and following comma & space if any
df['text']=df['text'].replace('Word(,\s)|(,\s)?Word','',regex=True)
What is happening in the code
Word
: will search for the text 'Word'
(,\s)?
: will look for comma,
followed by space\s
, ?
will look and match if it is available, if comma & space does not follow, then just the text 'Word' is matched. So ?
is pretty important here.
|
: this matches one of the 2 expressions (in your case this is needed for line 3 where there is a preceding space & comma)
You can see detailed explanation here Regex Demo
Output
id text
1 foo, bar
2
3 foo, bar
4 foo, bar
Upvotes: 0