Reputation: 311
I have a dataframe similar to the one below:
A B C
1 0 0.0
1 2 0.2
1 3 1.0
2 1 0.2
2 4 0.0
2 6 1.0
3 1 0.4
3 2 1.0
3 0 0.9
3 3 0.0
Now, for each subgroup, where a subgroup will have a shared A value, I want to find the row that has the minimum B value, then change the value of C for that row to 0.5. In this case, I would obtain a new dataframe:
A B C
1 0 0.5
1 2 0.2
1 3 1.0
2 1 0.5
2 4 0.0
2 6 1.0
3 1 0.4
3 2 1.0
3 0 0.5
3 3 0.0
As an addendum, if this operation replaces a 0.0 or 1.0 in the C column, then I'd like for the row to be duplicated with its old value. In this case, the A=1
subgroup infringes this rule (0.0 is replaced with 0.5) and therefore should produce:
A B C
1 0 0.0
1 0 0.5
1 2 0.2
1 3 1.0
...
The first problem is the main one, the second one isn't a priority, but of course, would welcome help with either.
Upvotes: 2
Views: 170
Reputation: 150745
Try:
df.loc[df.groupby('A')['B'].idxmin(), 'C'] = 0.5
Output:
A B C
0 1 0 0.5
1 1 2 0.2
2 1 3 1.0
3 2 1 0.5
4 2 4 0.0
5 2 6 1.0
6 3 1 0.4
7 3 2 1.0
8 3 0 0.5
9 3 3 0.0
For the addendum:
# minimum B rows
min_rows = df.groupby('A')['B'].idxmin()
# minimum B rows with C==0
zeros = df.loc[min_rows].loc[lambda x: x['C']==0].copy()
# change all min rows to 0.5
df.loc[min_rows, 'C'] = 0.5
# concat with 0
df = pd.concat([df, zeros])
Output (notice the last row):
A B C
0 1 0 0.5
1 1 2 0.2
2 1 3 1.0
3 2 1 0.5
4 2 4 0.0
5 2 6 1.0
6 3 1 0.4
7 3 2 1.0
8 3 0 0.5
9 3 3 0.0
0 1 0 0.0
Upvotes: 3