Reputation: 2998
i have a dataframe like this
Example1:
score_df
========
rid score Flag
1 5 y
1 6 n
1 7 y
1 8 n
i need to copy the scores to the remaining rows when Flag = 'y' appears first. In this example, flag=y appeared on first row when score is 5. i would copy the score 5 to remaining rows. output would look like this.
score_df
========
rid score Flag
1 5 y
1 5 n
1 5 y
1 5 n
Example2:
score_df
========
rid score Flag
1 5 n
1 6 y
1 7 n
1 8 y
in this case, the flag value is y only on the second row where score is 6. output would look like this
score_df
========
rid score Flag
1 5 n
1 6 y
1 6 n
1 6 n
So, i would like to fill the rows based on the first occurrence of flag value. How to do this in pandas without looping ?
Upvotes: 0
Views: 247
Reputation: 863116
Create mask for match all values after first y
value with compare by Series.eq
, cumulative sum by Series.cumsum
and compare gor greater like 0
by Series.gt
and set first match value to only filtered columns values:
m = df['Flag'].eq('y').cumsum().gt(0)
df.loc[m, 'score'] = df.loc[m, 'score'].iat[0]
print (df)
rid score Flag
0 1 5 y
1 1 5 n
2 1 5 y
3 1 5 n
df.loc[m, 'score'] = df.loc[m, 'score'].iat[0]
print (df)
rid score Flag
0 1 5 n
1 1 6 y
2 1 6 y
3 1 6 n
Solution if possible no match (no y
value in column) is a bit different, next
with iter
return first value if exist, here it set 0
values by only False
mask, so no replace.
This solution also working for data above, so is more general.
df.loc[m, 'score'] = next(iter(df.loc[m, 'score']), 0)
print (df)
rid score Flag
0 1 5 n
1 1 6 n
2 1 7 n
3 1 8 n
Upvotes: 2