Reputation: 2296
There are similar answers but I could not apply it to my own case I wanna get rid of forbidden characters for Windows directory names in my pandas dataframe. I tried to use something like:
df1['item_name'] = "".join(x for x in df1['item_name'].rstrip() if x.isalnum() or x in [" ", "-", "_"]) if df1['item_name'] else ""
Assume I have a dataframe like this
item_name
0 st*back
1 yhh?\xx
2 adfg%s
3 ghytt&{23
4 ghh_h
I want to get:
item_name
0 stback
1 yhhxx
2 adfgs
3 ghytt23
4 ghh_h
How I could achieve this? Note: I scraped data from internet earlier, and used the following code for the older version
item_name = "".join(x for x in item_name.text.rstrip() if x.isalnum() or x in [" ", "-", "_"]) if item_name else ""
Now, I have new observations for the same items and I want to merge them with older observations. But I forgot to use the same method when I rescraped
Upvotes: 2
Views: 3236
Reputation: 215047
You could summarize the condition as a negative character class, and use str.replace
to remove them, here \w
stands for word characters alnum + _
, \s
stands for space and -
is literal dash. With ^
in the character class, [^\w\s-]
matches any character that is not alpha numeric, nor [" ", "-", "_"]
, then you can use replace
method to remove them:
df.item_name.str.replace("[^\w\s-]", "")
#0 stback
#1 yhhxx
#2 adfgs
#3 ghytt23
#4 ghh_h
#Name: item_name, dtype: object
Upvotes: 4
Reputation: 294488
If you have a properly escaped list of characters
lst = ['\\\\', '\*', '\?', '%', '&', '\{']
df.replace(lst, '', regex=True)
item_name
0 stback
1 yhhxx
2 adfgs
3 ghytt23
4 ghh_h
Upvotes: 1
Reputation: 38415
Try
import re
df.item_name.apply(lambda x: re.sub('\W+', '', x))
0 stback
1 yhhxx
2 adfgs
3 ghytt23
4 ghh_h
Upvotes: 3