Reputation: 87

How do I find the two lowest values across selected columns in each row of a pandas dataframe?

In calculating grades, I drop each student's two lowest homework scores. A sample dataframe is shown here:

df=pd.DataFrame([[10, 9, 10, 5, 7], [8, 7, 9, 9, 4], [10, 10, 7, 0, 8],
                [5, 9, 7, 6, 3], [10, 5, 0, 8, 10], [8, 9, 10, 10, 10]],
               columns=['HW1', 'HW2', 'HW3', 'HW4', 'HW5'],
               index=['Aaron', 'Bridget', 'Charles', 'Donna', 'Evan', 'Francesca'])

df

The actual dataframe contains more columns than this (for tests, reports, etc.), but it's homework assignments where I need to find the two lowest scores for each record.

I want to add two columns in the dataframe, ['Lowest'] and ['Second_Lowest'], and indicate each student's lowest and second lowest scores, respectively, in those columns.

I tried the following code to get the lowest value, using the .min() method, but got an error:

df['Lowest_HW'] = df[['HW1', 'HW2', 'HW3', 'HW4', 'HW5']].min()

df.head()

So for Aaron, the value for Lowest will be 5 and Second_Lowest will be 7; for Francesca, Lowest will be 8 and Second_Lowest will be 9.

Apparently, the code for the .min() method isn't right, and I'm completely clueless about finding the second lowest value.

Any advice on how I might tackle these steps?

Upvotes: 3

Answers (3)

Ben.T

Reputation: 29635

You can use np.sort and select only the first 2 columns to create both columns lowest at once:

df['lowest'], df['second_lowest'] = np.sort(df, axis=1)[:, :2].T
print (df)
           HW1  HW2  HW3  HW4  HW5  lowest  second_lowest
Aaron       10    9   10    5    7       5              7
Bridget      8    7    9    9    4       4              7
Charles     10   10    7    0    8       0              7
Donna        5    9    7    6    3       3              5
Evan        10    5    0    8   10       0              5
Francesca    8    9   10   10   10       8              9

but if your point is to get the sum without these two lowest values, then do not create the above columns and do

df['sum_without_2lowest'] = np.sort(df, axis=1)[:, 2:].sum(1)
print (df)
           HW1  HW2  HW3  HW4  HW5  sum_without_2lowest
Aaron       10    9   10    5    7                   29
Bridget      8    7    9    9    4                   26
Charles     10   10    7    0    8                   28
Donna        5    9    7    6    3                   22
Evan        10    5    0    8   10                   28
Francesca    8    9   10   10   10                   30

Upvotes: 4

Hugolmn

Reputation: 1560

Another method again, using nsmallest. This time it directly removes the two lowest grades from the DataFrame

df[df.apply(lambda x: x.nsmallest(2), axis=1).isna()]

            HW1   HW2   HW3   HW4   HW5
Aaron      10.0   9.0  10.0   NaN   NaN
Bridget     8.0   NaN   9.0   9.0   NaN
Charles    10.0  10.0   NaN   NaN   8.0
Donna       NaN   9.0   7.0   6.0   NaN
Evan       10.0   NaN   NaN   8.0  10.0
Francesca   NaN   NaN  10.0  10.0  10.0

As you mentioned in the comments that you want to get the sum of all grades minus the two lowest ones, this can easily do the trick!

df[df.apply(lambda x: x.nsmallest(2), axis=1).isna()].sum(axis=1)

Aaron        29.0
Bridget      26.0
Charles      28.0
Donna        22.0
Evan         28.0
Francesca    30.0
dtype: float64

Upvotes: 0

ScootCork

Reputation: 3676

You could do so using apply, sorting the columns and selecting the first and the 2nd value.

columns = [c for c in df.columns if c.startswith('HW')]
df[['lowest', '2nd_lowest']] = df[columns].apply(lambda x: sorted(x)[0:2], axis=1, result_type='expand')


           HW1  HW2  HW3  HW4  HW5  lowest  2nd_lowest
Aaron       10    9   10    5    7       5           7
Bridget      8    7    9    9    4       4           7
Charles     10   10    7    0    8       0           7
Donna        5    9    7    6    3       3           5
Evan        10    5    0    8   10       0           5
Francesca    8    9   10   10   10       8           9

Upvotes: 1

How do I find the two lowest values across selected columns in each row of a pandas dataframe?

Answers (3)

Related Questions