Reputation: 311
I have a very large pandas data frame containing both string and integer columns. I'd like to search the whole data frame for a specific substring, and if found, replace the full string with something else.
I've found some examples that do this by specifying the column(s) to search, like this:
df = pd.DataFrame([[1,'A'], [2,'(B,D,E)'], [3,'C']],columns=['Question','Answer'])
df.loc[df['Answer'].str.contains(','), 'Answer'] = 'X'
But because my data frame has dozens of string columns in no particular order, I don't want to specify them all. As far as I can tell using df.replace
will not work since I'm only searching for a substring. Thanks for your help!
Upvotes: 3
Views: 19474
Reputation: 215127
You can use data frame replace
method with regex=True
, and use .*,.*
to match strings that contain a comma (you can replace comma with other any other substring you want to detect):
str_cols = ['Answer'] # specify columns you want to replace
df[str_cols] = df[str_cols].replace('.*,.*', 'X', regex=True)
df
#Question Answer
#0 1 A
#1 2 X
#2 3 C
or if you want to replace all string columns:
str_cols = df.select_dtypes(['object']).columns
Upvotes: 10