Vega
Vega

Reputation: 2929

How to find the maximum value of 3 columns and multiply the other 2 with a fixed factor, then add all together?

I have a large dataframe df:

df = pd.DataFrame(
    {
        "length": [1200,600, 600,1201, 600, 600, 500, 700],
        "width": [600, 1200, 600, 600, 1201, 600, 800, 700],
        "height": [600, 600, 1200, 600, 600, 1201, 800, 700],
    }
)
length width height
1200 600 600
600 1200 600
600 600 1200
1201 600 600
600 1201 600
600 600 1201
500 800 800
700 700 700

I need to find the highest value per row, then multiply the other with *2 and add all together.

So the formula is:

highest_value + (lower_value_1 * 2) + (lower_value_2 * 2)

The output should be:

length width height girth
1200 600 600 3600
600 1200 600 3600
600 600 1200 3600
1201 600 600 3601
600 1201 600 3601
600 600 1201 3601
500 500 800 3400
700 700 700 3500

I can find the maximum of the 3 rows with

df[["length ", "width ", "height "]].max(axis=1)

but I need help with the rest.

Upvotes: 0

Views: 492

Answers (2)

user17242583
user17242583

Reputation:

Here is a very fast, vectorized solution that takes advantage of numpy broadcasting to achieve very high performance:

mask = pd.DataFrame(df.to_numpy() != df.max(axis=1).to_numpy().reshape(-1,1), columns=df.columns)
df['girth'] = df.add(df[mask].fillna(0)).sum(axis=1)

Output:

>>> df
   length  width  height   girth
0    1200    600     600  3600.0
1     600   1200     600  3600.0
2     600    600    1200  3600.0
3    1201    600     600  3601.0
4     600   1201     600  3601.0
5     600    600    1201  3601.0

On my 3.6 GHz 8-Core Intel Core i9, this took about ~950ms for 6,000,000 (6 million) rows.


Basically what we're doing here is:

  1. Creating a mask that selects all the values in the dataframe that are not the max of the row they're in
  2. Adding the values selected by that mask (the non-max values) to the dataframe again
  3. Summing the resulting rows

Upvotes: 0

Quixotic22
Quixotic22

Reputation: 2924

This seems to be a simpler solution

df['New'] = df.sum(axis=1)*2 - df.max(axis=1)

Edit, was meant to copy this

df['Girth'] = df.apply(lambda x: sum(x)*2 - max(x), axis = 1)

Upvotes: 2

Related Questions