Reputation: 65
I was trying to remove all the symbols from my wordlist for which I made a pandas dataframe 'df'. I know there are easier ways, but I wanted to try my way, where I listed the unique characters in a np.array first and then using for loop I separated only the special characters from the alphabet and put those special characters into another np.array. My symbol array
symbols=
[['!']
['&']
["'"]
['(']
[')']
[',']
['-']
['.']
['1']
['2']
['3']
[':']
[';']
['?']
['[']
[']']]
Now, I ran a for loop for each of the item in the symbol array and replaced that with space using df.replace (Before that I added an empty space before all the symbols and created symbolspace, to avoid error with replacing '(')
for symbol in symbolspace:
df=df.str.replace(str(symbol),"", regex= True)
Now my question is, it did the job correctly except for the '-'. But at first I tried this with df.replace(symbol,"", regex= True)
instead of df.replace(str(symbol),"", regex= True)
and that gave me this error: missing ), unterminated subpattern
. Please answer why am I getting this error. All the entries of that array symbolspace are already strings (it shows str64) even if I don't use str(symbol)
. And why did it not work for '-' from the symbol np.array? It works when i write only df.replace('-',"",regex=True)
Upvotes: 0
Views: 110
Reputation: 30032
But at first I tried this with
df.replace(symbol, "", regex= True)
that gave me thiserror: missing ), unterminated subpattern.
In this case, symbol
is a list like ['(']
. So pandas.DataFrame.replace()
will replace elements in list with ""
.
The reason why you get error is because you enable regex
, and (
is a special character in regex. It should be well enclosed by )
.
why did it not work for '-' from the symbol
np.array
?
When you do str(symbol)
, the list is converted to string like ['-']
. Since you enable regex mode with regex=True
, things in []
will be treated as a set of characters you wish to match. -
in square brackets is treated as special character. Say you write [a-c]
, it is the same with is the same as [abc]
.
Upvotes: 1