Reputation: 889
For the following DataFrame
my_cols = ["a", "b", "c"]
df2 = pd.DataFrame([["1a", "2a", "3a"], ["4aa", "5a", "6a"], ["7a", "8a", "9a"],
["1a", "2a", "3a"], ["4a", "5a", "6a"], ["7a", "8a", "9a"]],
columns=my_cols)
df2:
a b c
0 1a 2a 3a
1 4a 5a 6a
2 7a 8a 9a
3 1a 2a 3a
4 4a 5a 6a
5 7a 8a 9a
I want to evaluate if at any row a value contains the substring 4a
. In that case I want to reassing in the whole row any a
by b
my_str = "4a"
for x in range(df2.shape[0]):
if my_str in df2["a"][x]:
for y in range(len(my_cols)):
df2[my_cols[y]][x] = df2[my_cols[y]][x].replace("a","b")
df2:
a b c
0 1a 2a 3a
1 4ba 5b 6b
2 7a 8a 9a
3 1a 2a 3a
4 4b 5b 6b
5 7a 8a 9a
This method seems too inefficient, because of the multiple loops and the assignment done by replace()
. Are there some built-in methods that could do the job? Any improvement will be appreciated.
Upvotes: 0
Views: 76
Reputation: 889
Thanks to @yatu and @Alessia Mondolo contribution, that would be the answer:
m = df2["a"].str.contains(my_str, na=False)
df2[m] = df2[m].replace({'a': 'b'}, regex=True)
Upvotes: 0
Reputation: 473
A possible solution is the following:
my_cols = ["a", "b", "c"]
df2 = pd.DataFrame([["1a", "2a", "3a"], ["4aa", "5a", "6a"], ["7a", "8a", "9a"],
["1a", "2a", "3a"], ["4a", "5a", "6a"], ["7a", "8a", "9a"]],
columns=my_cols)
mask = df2.apply(lambda row: row.astype(str).str.contains('4a').any(), axis=1)
df2.loc[df2[mask].index, df2.columns] = df2[mask].replace({'a': 'b'}, regex=True)
df2:
a b c
0 1a 2a 3a
1 4bb 5b 6b
2 7a 8a 9a
3 1a 2a 3a
4 4b 5b 6b
5 7a 8a 9a
First, we create a mask that identifies all the rows in which at least one column contains the substring '4a'. Then, we update those rows with a copy of the rows in which we have replaced every 'a' with 'b'.
Upvotes: 1