Reputation: 1037

Generate summary for each row based on column values

i have data frame like this.

import pandas as pd

raw_data = {'Sub1':['A','B','C','D','E'],
            'Sub2':['F','G','H','I','J'],
            'Sub3':['K','L','M','N','O'],
    'S_score1': [1, 0, 0, 6,0], 
    'S_score2': [0, 1, 0, 6,0], 
    'S_score3': [0, 1, 0, 6,0], 
    }

df2 = pd.DataFrame(raw_data, columns = ['Sub1','Sub2','Sub3','S_score1', 'S_score2', 'S_score3'])

Have data frame

i wants to check score columns and check if the score is greater than 1 then take the respective subject in text.

Wanted Output:

Upvotes: 1

Answers (3)

Valdi_Bo

Reputation: 31011

One of possible solutions consists of the following steps:

Define a function generating the output text for a source row. This function should join source columns filtered for not null.
Generate subs table containing Sub1, Sub2 and Sub3.
Generate msk (mask) table containing S_score... columns and change column names to Sub1, Sub2 and Sub3.
Compute subs.where(msk) and apply the above function to each row. Note that for False elements in mask, the respective output element is None, so the function applied will not include it in join.

So the whole script can look like below:

def txt(x):
    tbl = list(filter(lambda elem: not pd.isnull(elem), x))
    if len(tbl) > 0:
        return 'You have scored on ' + ', '.join(tbl)
    else:
        return 'You have not scored any subject'

subs = df.loc[:, :'Sub3']
msk = df.loc[:, 'S_score1':] > 0
msk.columns = ['Sub1', 'Sub2', 'Sub3']
df['s_text'] = subs.where(msk).apply(txt, axis=1)

Upvotes: 0

cs95

Reputation: 403050

First, separate out the grade columns from the one hot columns.

u = df2.filter(like='Sub')
v = df2.filter(like='S_score').astype(bool)

Next, aggregate letter grades by multiplication, and set column values.

r = (u.mul(v.values)
      .agg(','.join, axis=1)
      .str.strip(',')
      .str.replace(',{2,}', ','))
df2['s_text'] = np.where(r.str.len() > 0, 'You scored ' + r, 'N/A')    
df2

  Sub1 Sub2 Sub3  S_score1  S_score2  S_score3            s_text
0    A    F    K         1         0         0      You scored A
1    B    G    L         0         1         1    You scored G,L
2    C    H    M         0         0         0               N/A
3    D    I    N         6         6         6  You scored D,I,N
4    E    J    O         0         0         0               N/A

To make the last separator different, you will need a custom function.

def join(lst):
    lst = lst[lst != '']
    if len(lst) > 1:
        return 'You scored ' + ', '.join(lst[:-1]) + ' and ' + lst[-1] 
    elif len(lst) > 0:
        return 'You scored ' + ', '.join(lst)
    return 'N/A'

df2['s_text'] = u.mul(v.values).agg(join, axis=1)
df2

  Sub1 Sub2 Sub3  S_score1  S_score2  S_score3                 s_text
0    A    F    K         1         0         0           You scored A
1    B    G    L         0         1         1     You scored G and L
2    C    H    M         0         0         0                    N/A
3    D    I    N         6         6         6  You scored D, I and N
4    E    J    O         0         0         0                    N/A

Upvotes: 2

BENY

Reputation: 323366

Doing with join after multiple

s=(df2.filter(like='Sub')*df2.filter(like='S_').ge(1).values).apply( lambda x : ','.join([y for y in x if y is not '']),axis=1)
s
Out[324]: 
0        A
1      G,L
2         
3    D,I,N
4         
dtype: object

Then chain with np.where

np.where(s=='','You do not have score','You have'+s)
Out[326]: 
array(['You haveA', 'You haveG,L', 'You do not have score',
       'You haveD,I,N', 'You do not have score'], dtype=object)

#Assign it back 

df2['s_txt']=np.where(s=='','You do not have score','You have'+s)
df2
Out[328]: 
  Sub1 Sub2          ...           S_score3                  s_txt
0    A    F          ...                  0              You haveA
1    B    G          ...                  1            You haveG,L
2    C    H          ...                  0  You do not have score
3    D    I          ...                  6          You haveD,I,N
4    E    J          ...                  0  You do not have score
[5 rows x 7 columns]

Upvotes: 2

Generate summary for each row based on column values

Answers (3)

Related Questions