Nakkhatra
Nakkhatra

Reputation: 65

df.replace showing error without formating the to_replace with str

I was trying to remove all the symbols from my wordlist for which I made a pandas dataframe 'df'. I know there are easier ways, but I wanted to try my way, where I listed the unique characters in a np.array first and then using for loop I separated only the special characters from the alphabet and put those special characters into another np.array. My symbol array

symbols= 
[['!']
 ['&']
 ["'"]
 ['(']
 [')']
 [',']
 ['-']
 ['.']
 ['1']
 ['2']
 ['3']
 [':']
 [';']
 ['?']
 ['[']
 [']']]

Now, I ran a for loop for each of the item in the symbol array and replaced that with space using df.replace (Before that I added an empty space before all the symbols and created symbolspace, to avoid error with replacing '(')

for symbol in symbolspace:
    df=df.str.replace(str(symbol),"", regex= True)   

Now my question is, it did the job correctly except for the '-'. But at first I tried this with df.replace(symbol,"", regex= True) instead of df.replace(str(symbol),"", regex= True) and that gave me this error: missing ), unterminated subpattern. Please answer why am I getting this error. All the entries of that array symbolspace are already strings (it shows str64) even if I don't use str(symbol). And why did it not work for '-' from the symbol np.array? It works when i write only df.replace('-',"",regex=True)

Upvotes: 0

Views: 110

Answers (1)

Ynjxsjmh
Ynjxsjmh

Reputation: 30032

But at first I tried this with df.replace(symbol, "", regex= True) that gave me this error: missing ), unterminated subpattern.

In this case, symbol is a list like ['(']. So pandas.DataFrame.replace() will replace elements in list with "".

The reason why you get error is because you enable regex, and ( is a special character in regex. It should be well enclosed by ).

why did it not work for '-' from the symbol np.array?

When you do str(symbol), the list is converted to string like ['-']. Since you enable regex mode with regex=True, things in [] will be treated as a set of characters you wish to match. - in square brackets is treated as special character. Say you write [a-c], it is the same with is the same as [abc].

Upvotes: 1

Related Questions