Reputation: 87
In calculating grades, I drop each student's two lowest homework scores. A sample dataframe is shown here:
df=pd.DataFrame([[10, 9, 10, 5, 7], [8, 7, 9, 9, 4], [10, 10, 7, 0, 8],
[5, 9, 7, 6, 3], [10, 5, 0, 8, 10], [8, 9, 10, 10, 10]],
columns=['HW1', 'HW2', 'HW3', 'HW4', 'HW5'],
index=['Aaron', 'Bridget', 'Charles', 'Donna', 'Evan', 'Francesca'])
df
The actual dataframe contains more columns than this (for tests, reports, etc.), but it's homework assignments where I need to find the two lowest scores for each record.
I want to add two columns in the dataframe, ['Lowest'] and ['Second_Lowest'], and indicate each student's lowest and second lowest scores, respectively, in those columns.
I tried the following code to get the lowest value, using the .min() method, but got an error:
df['Lowest_HW'] = df[['HW1', 'HW2', 'HW3', 'HW4', 'HW5']].min()
df.head()
So for Aaron, the value for Lowest will be 5 and Second_Lowest will be 7; for Francesca, Lowest will be 8 and Second_Lowest will be 9.
Apparently, the code for the .min() method isn't right, and I'm completely clueless about finding the second lowest value.
Any advice on how I might tackle these steps?
Upvotes: 3
Views: 1243
Reputation: 29635
You can use np.sort
and select only the first 2 columns to create both columns lowest at once:
df['lowest'], df['second_lowest'] = np.sort(df, axis=1)[:, :2].T
print (df)
HW1 HW2 HW3 HW4 HW5 lowest second_lowest
Aaron 10 9 10 5 7 5 7
Bridget 8 7 9 9 4 4 7
Charles 10 10 7 0 8 0 7
Donna 5 9 7 6 3 3 5
Evan 10 5 0 8 10 0 5
Francesca 8 9 10 10 10 8 9
but if your point is to get the sum without these two lowest values, then do not create the above columns and do
df['sum_without_2lowest'] = np.sort(df, axis=1)[:, 2:].sum(1)
print (df)
HW1 HW2 HW3 HW4 HW5 sum_without_2lowest
Aaron 10 9 10 5 7 29
Bridget 8 7 9 9 4 26
Charles 10 10 7 0 8 28
Donna 5 9 7 6 3 22
Evan 10 5 0 8 10 28
Francesca 8 9 10 10 10 30
Upvotes: 4
Reputation: 1560
Another method again, using nsmallest
. This time it directly removes the two lowest grades from the DataFrame
df[df.apply(lambda x: x.nsmallest(2), axis=1).isna()]
HW1 HW2 HW3 HW4 HW5
Aaron 10.0 9.0 10.0 NaN NaN
Bridget 8.0 NaN 9.0 9.0 NaN
Charles 10.0 10.0 NaN NaN 8.0
Donna NaN 9.0 7.0 6.0 NaN
Evan 10.0 NaN NaN 8.0 10.0
Francesca NaN NaN 10.0 10.0 10.0
As you mentioned in the comments that you want to get the sum of all grades minus the two lowest ones, this can easily do the trick!
df[df.apply(lambda x: x.nsmallest(2), axis=1).isna()].sum(axis=1)
Aaron 29.0
Bridget 26.0
Charles 28.0
Donna 22.0
Evan 28.0
Francesca 30.0
dtype: float64
Upvotes: 0
Reputation: 3676
You could do so using apply
, sorting the columns and selecting the first and the 2nd value.
columns = [c for c in df.columns if c.startswith('HW')]
df[['lowest', '2nd_lowest']] = df[columns].apply(lambda x: sorted(x)[0:2], axis=1, result_type='expand')
HW1 HW2 HW3 HW4 HW5 lowest 2nd_lowest
Aaron 10 9 10 5 7 5 7
Bridget 8 7 9 9 4 4 7
Charles 10 10 7 0 8 0 7
Donna 5 9 7 6 3 3 5
Evan 10 5 0 8 10 0 5
Francesca 8 9 10 10 10 8 9
Upvotes: 1