PedsB
PedsB

Reputation: 311

Change value of a specific column on dataframe subgroups in pandas based on condition

I have a dataframe similar to the one below:

A B  C
1 0 0.0
1 2 0.2
1 3 1.0
2 1 0.2
2 4 0.0
2 6 1.0
3 1 0.4
3 2 1.0
3 0 0.9
3 3 0.0

Now, for each subgroup, where a subgroup will have a shared A value, I want to find the row that has the minimum B value, then change the value of C for that row to 0.5. In this case, I would obtain a new dataframe:

A B  C
1 0 0.5
1 2 0.2
1 3 1.0
2 1 0.5
2 4 0.0
2 6 1.0
3 1 0.4
3 2 1.0
3 0 0.5
3 3 0.0

As an addendum, if this operation replaces a 0.0 or 1.0 in the C column, then I'd like for the row to be duplicated with its old value. In this case, the A=1 subgroup infringes this rule (0.0 is replaced with 0.5) and therefore should produce:

A B  C
1 0 0.0
1 0 0.5
1 2 0.2
1 3 1.0
...

The first problem is the main one, the second one isn't a priority, but of course, would welcome help with either.

Upvotes: 2

Views: 170

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150745

Try:

df.loc[df.groupby('A')['B'].idxmin(), 'C'] = 0.5

Output:

   A  B    C
0  1  0  0.5
1  1  2  0.2
2  1  3  1.0
3  2  1  0.5
4  2  4  0.0
5  2  6  1.0
6  3  1  0.4
7  3  2  1.0
8  3  0  0.5
9  3  3  0.0

For the addendum:

# minimum B rows 
min_rows = df.groupby('A')['B'].idxmin()
# minimum B rows with C==0
zeros = df.loc[min_rows].loc[lambda x: x['C']==0].copy()

# change all min rows to 0.5
df.loc[min_rows, 'C'] = 0.5

# concat with 0
df = pd.concat([df, zeros])

Output (notice the last row):

   A  B    C
0  1  0  0.5
1  1  2  0.2
2  1  3  1.0
3  2  1  0.5
4  2  4  0.0
5  2  6  1.0
6  3  1  0.4
7  3  2  1.0
8  3  0  0.5
9  3  3  0.0
0  1  0  0.0

Upvotes: 3

Related Questions