Aero
Aero

Reputation: 311

Pandas dataframe replace string in multiple columns by finding substring

I have a very large pandas data frame containing both string and integer columns. I'd like to search the whole data frame for a specific substring, and if found, replace the full string with something else.

I've found some examples that do this by specifying the column(s) to search, like this:

df = pd.DataFrame([[1,'A'], [2,'(B,D,E)'], [3,'C']],columns=['Question','Answer'])
df.loc[df['Answer'].str.contains(','), 'Answer'] = 'X'

But because my data frame has dozens of string columns in no particular order, I don't want to specify them all. As far as I can tell using df.replace will not work since I'm only searching for a substring. Thanks for your help!

Upvotes: 3

Views: 19474

Answers (1)

akuiper
akuiper

Reputation: 215127

You can use data frame replace method with regex=True, and use .*,.* to match strings that contain a comma (you can replace comma with other any other substring you want to detect):

str_cols = ['Answer']    # specify columns you want to replace
df[str_cols] = df[str_cols].replace('.*,.*', 'X', regex=True)
df
#Question   Answer
#0      1       A
#1      2       X
#2      3       C

or if you want to replace all string columns:

str_cols = df.select_dtypes(['object']).columns

Upvotes: 10

Related Questions