edyvedy13
edyvedy13

Reputation: 2296

Applying string operations to pandas data frame

There are similar answers but I could not apply it to my own case I wanna get rid of forbidden characters for Windows directory names in my pandas dataframe. I tried to use something like:

df1['item_name'] =  "".join(x for x in df1['item_name'].rstrip() if x.isalnum() or x in [" ", "-", "_"]) if df1['item_name'] else ""

Assume I have a dataframe like this

 item_name
0  st*back
1  yhh?\xx
2  adfg%s
3  ghytt&{23
4  ghh_h

I want to get:

   item_name
0  stback
1  yhhxx
2  adfgs
3  ghytt23
4  ghh_h

How I could achieve this? Note: I scraped data from internet earlier, and used the following code for the older version

item_name = "".join(x for x in item_name.text.rstrip() if x.isalnum() or x in [" ", "-", "_"]) if item_name else ""

Now, I have new observations for the same items and I want to merge them with older observations. But I forgot to use the same method when I rescraped

Upvotes: 2

Views: 3236

Answers (3)

akuiper
akuiper

Reputation: 215047

You could summarize the condition as a negative character class, and use str.replace to remove them, here \w stands for word characters alnum + _, \s stands for space and - is literal dash. With ^ in the character class, [^\w\s-] matches any character that is not alpha numeric, nor [" ", "-", "_"], then you can use replace method to remove them:

df.item_name.str.replace("[^\w\s-]", "")

#0     stback
#1      yhhxx
#2      adfgs
#3    ghytt23
#4      ghh_h
#Name: item_name, dtype: object

Upvotes: 4

piRSquared
piRSquared

Reputation: 294488

If you have a properly escaped list of characters

lst = ['\\\\', '\*', '\?', '%', '&', '\{']
df.replace(lst, '', regex=True)

  item_name
0    stback
1     yhhxx
2     adfgs
3   ghytt23
4     ghh_h

Upvotes: 1

Vaishali
Vaishali

Reputation: 38415

Try

import re
df.item_name.apply(lambda x: re.sub('\W+', '', x))

0     stback
1      yhhxx
2      adfgs
3    ghytt23
4      ghh_h

Upvotes: 3

Related Questions