S. Slade
S. Slade

Reputation: 67

How to combine more than two columns?

I have a list of students in a csv file. I want (using Python) to display four columns in that I want to display the male students who have higher marks in Maths, Computer, and Physics.

I tried to use pandas library.

marks = pd.concat([data['name'], 
                 data.loc[data['students']==1, 'maths'].nlargest(n=10)], 'computer'].nlargest(n=10)], 'physics'].nlargest(n=10)])

I used 1 for male students and 0 for female students. It gives me an error saying: Invalid syntax.

Upvotes: 1

Views: 85

Answers (1)

Stefan
Stefan

Reputation: 42875

Here's a way to show the top 10 students in each of the disciplines. You could of course just sum the three scores and select the students with the highest total if you want the combined as opposed to the individual performance (see illustration below).

df1 = pd.DataFrame(data={'name': [''.join(random.choice('abcdefgh') for _ in range(8)) for i in range(100)],
                         'students': np.random.randint(0, 2, size=100)})
df2 = pd.DataFrame(data=np.random.randint(0, 10, size=(100, 3)), columns=['math', 'physics', 'computers'])
data = pd.concat([df1, df2], axis=1)

data.info()

RangeIndex: 100 entries, 0 to 99
Data columns (total 5 columns):
name         100 non-null object
students     100 non-null int64
math         100 non-null int64
physics      100 non-null int64
computers    100 non-null int64
dtypes: int64(4), object(1)
memory usage: 4.0+ KB

res = pd.concat([data.loc[:, ['name']], data.loc[data['students'] == 1, 'math'].nlargest(n=10), data.loc[data['students'] == 1, 'physics'].nlargest(n=10), data.loc[data['students'] == 1, 'computers'].nlargest(n=10)], axis=1)

res.dropna(how='all', subset=['math', 'physics', 'computers'])

        name  math  physics  computers
0   geghhbce   NaN      9.0        NaN
1   hbbdhcef   NaN      7.0        NaN
4   ghgffgga   NaN      NaN        8.0
6   hfcaccgg   8.0      NaN        NaN
14  feechdec   NaN      NaN        8.0
15  dfaabcgh   9.0      NaN        NaN
16  ghbchgdg   9.0      NaN        NaN
23  fbeggcha   NaN      NaN        9.0
27  agechbcf   8.0      NaN        NaN
28  bcddedeg   NaN      NaN        9.0
30  hcdgbgdg   NaN      8.0        NaN
38  fgdfeefd   NaN      NaN        9.0
39  fbcgbeda   9.0      NaN        NaN
41  agbdaegg   8.0      NaN        9.0
49  adgbefgg   NaN      8.0        NaN
50  dehdhhhh   NaN      NaN        9.0
55  ccbaaagc   NaN      8.0        NaN
68  hhggfffe   8.0      9.0        NaN
71  bhggbheg   NaN      9.0        NaN
84  aabcefhf   NaN      NaN        9.0
85  feeeefbd   9.0      NaN        NaN
86  hgeecacc   NaN      8.0        NaN
88  ggedgfeg   9.0      8.0        NaN
89  faafgbfe   9.0      NaN        9.0
94  degegegd   NaN      8.0        NaN
99  beadccdb   NaN      NaN        9.0


data['total'] = data.loc[:, ['math', 'physics', 'computers']].sum(axis=1)
data[data.students==1].nlargest(10, 'total').sort_values('total', ascending=False)

        name  students  math  physics  computers  total
29  fahddafg         1     8        8          8     24
79  acchhcdb         1     8        9          7     24
9   ecacceff         1     7        9          7     23
16  dccefaeb         1     9        9          4     22
92  dhaechfb         1     4        9          9     22
47  eefbfeef         1     8        8          5     21
60  bbfaaada         1     4        7          9     20
82  fbbbehbf         1     9        3          8     20
18  dhhfgcbb         1     8        8          3     19
1   ehfdhegg         1     5        7          6     18

Upvotes: 1

Related Questions