Reputation: 3362
Let's say I have a dataframe that looks like this:
df4
df4 = pd.DataFrame({'Q':['apple', 'apple', 'orange', 'Apple', 'orange'], 'R':['a.txt', 'a.txt', 'a.txt', 'b.txt', 'b.txt']})
>>> df4
Q R
0 apple a.txt
1 apple a.txt
2 orange a.txt
3 Apple b.txt
4 orange b.txt
What I would like to output is this:
Q R
0 breakfast a.txt
1 apple a.txt
2 orange a.txt
3 breakfast b.txt
4 orange b.txt
In other words, case insensitive, I want to search every row in a dataframe, find the first occurrence of certain words (in this case, that word is apple), and replace it with another word.
Is there a way to do this?
Upvotes: 2
Views: 78
Reputation: 294278
I just really wanted to answer this question.
def swap_first(s):
swap = 1
luk4 = {'apple'}
for x in s:
if x.lower() in luk4 and swap:
yield 'breakfast'
swap ^= 1
else:
yield x
if x not in luk4:
swap ^= 1
df4.assign(Q=[*swap_first(df4.Q)])
Q R
0 breakfast a.txt
1 apple a.txt
2 orange a.txt
3 breakfast b.txt
4 orange b.txt
Upvotes: 1
Reputation: 402533
Here's a vectorised solution with groupby
and idxmin
:
v = df.Q.str.lower().eq('apple')
v2 = (~v).cumsum().where(v)
df.loc[v2.groupby(v2).idxmin().values, 'Q'] = 'breakfast'
df
Q R
0 breakfast a.txt
1 apple a.txt
2 orange a.txt
3 breakfast b.txt
4 orange b.txt
Upvotes: 6