Reputation: 4618
Imagine I have the following pandas series:
tmp = pd.Series(['k.; mlm', '(+).', 'a;b/c', '!".: abc', 'abc dfg', 'qwert@'])
And I want, for all elements, remove the words that are only punctuations using regex, I was trying to use something like:
tmp.str.replace(regex, '')
My final series would be:
tmp = pd.Series(['k.; mlm', '', 'a;b/c', 'abc', 'abc dfg', 'qwert@'])
Edit: I'm considering punctuation by the unicode table
Upvotes: 3
Views: 118
Reputation: 627292
It looks as if you planned to clear a field value (replace it all with an empty string) if the whole string consists of punctuation.
You may do that with
tmp.str.replace(r'^(?:[^\w\s]|_)+$', '')
See the regex demo. NOTE: If you only plan to clear the value of rows that only consist of ASCII punctuation, you may use string.punctuation
:
tmp.str.replace(f"^[{''.join(map(re.escape,string.punctuation))}]+$", '')
print(f"[{''.join(map(re.escape,string.punctuation))}]")
shows [!"\#\$%\&'\(\)\*\+,\-\./:;<=>\?@\[\\\]\^_`\{\|\}\~]
, see its online demo. As expected, it does not match punctuation like ’
, ‘
, “
, ”
, «
, »
, etc.
Details
^
- start of string(?:
- start of a non-capturing group
[^
- start of a negated character class (it will match all chars BUT the ones specified inside it):
\w
- word chars (any Unicode letters, digits, and _
)\s
- any Unicode whitespace]+
- end of the class, +
repeats it 1 or more times|
- or_
- an underscore)
- end of a group$
- end of string.Pandas test:
>>> tmp.str.replace(r'^(?:[^\w\s]|_)+$', '')
0 k.; mlm
1
2 a;b/c
3 !".: abc
4 abc dfg
5 qwert@
dtype: object
Upvotes: 1
Reputation: 88285
You could use str.contains
with the pattern [^\W]
to match strings that contain at least one character which is not a punctuations sign.
Note that []
matches any character contained in the set, and by adding ^
at the beginning, all the characters that are not in the set will be matched.
tmp.where(tmp.str.contains(r'[^\W]'), '')
0 k.; mlm
1
2 a;b/c
3 !".: abc
4 abc dfg
5 qwert@
dtype: object
Upvotes: 2
Reputation: 26676
IICU
tmp.replace('[()+!".:]', '', regex=True).to_list()
OUTCOME
['k; mlm', '', 'a;b/c', ' abc', 'abc dfg', 'qwert@']
Explanation
[]
in this case contains characters to match
df. replace
Replaces values given in to_replace with value. I set Regex =True because I have used regex expression.
Finally I convert them to list by df.to_list()
function
Upvotes: 1
Reputation: 38415
You can use str.replace with negative lookahead regex, it looks for a string containing any alpha-numeric character (denoted by \w)
tmp.replace('^((?!\w).)*$', '', regex=True)
0 k.; mlm
1
2 a;b/c
3 !".: abc
4 abc dfg
5 qwert@
Upvotes: 1