taco is delicious
taco is delicious

Reputation: 61

Appending column values into new cell in the same row in Pandas dataframe

I have a csv file that has columns name, sub_a, sub_b, sub_c, sub_d, segment and gender. I would like create a new column classes with all the classes (sub-columns) seperated by comma that each student takes.

What would be the easiest way to accomplish this?

The result dataframe should look like this:

+------+-------+-------+-------+-------+---------+--------+---------------------+
| name | sub_a | sub_b | sub_c | sub_d | segment | gender | classes             |
+------+-------+-------+-------+-------+---------+--------+---------------------+
| john | 1     | 1     | 0     | 1     | 1       | 0      | sub_a, sub_b, sub_d |
+------+-------+-------+-------+-------+---------+--------+---------------------+
| mike | 1     | 0     | 1     | 1     | 0       | 0      | sub_a, sub_c, sub_d |
+------+-------+-------+-------+-------+---------+--------+---------------------+
| mary | 1     | 1     | 0     | 1     | 1       | 1      | sub_a, sub_b, sub_d |
+------+-------+-------+-------+-------+---------+--------+---------------------+
| fred | 1     | 0     | 1     | 0     | 0       | 0      | sub_a, sub_c        |
+------+-------+-------+-------+-------+---------+--------+---------------------+

Upvotes: 5

Views: 971

Answers (4)

Dishin H Goyani
Dishin H Goyani

Reputation: 7723

You can use apply with axis=1

For Ex.: if your dataframe like

df
   A_a  A_b  B_b  B_c
0    1    0    0    1
1    0    1    0    1
2    1    0    1    0

you can do

df['classes'] = df.apply(lambda x: ', '.join(df.columns[x==1]), axis = 1)
df
   A_a  A_b  B_b  B_c   classes
0    1    0    0    1  A_a, B_c
1    0    1    0    1  A_b, B_c
2    1    0    1    0  A_a, B_b

To apply on specific columns you can filter first using loc

#for your sample data
df_ = df.loc[:,'sub_a':'sub_d']             #or df.loc[:,'sub_a', 'sub_b', 'sub_c', 'sub_d']
df_.apply(lambda x: ', '.join(df_.columns[x==1]), axis = 1)

Upvotes: 1

BENY
BENY

Reputation: 323366

Let us try dot

s=df.filter(like='sub')
df['classes']=s.astype(bool).dot(s.columns+',').str[:-1]

Upvotes: 2

Huy Tran
Huy Tran

Reputation: 1902

You can use apply only on the sub-columns to apply a lambda function that will join the names of the sub-columns where the values of the columns equal 1:

sub_cols = ['sub_a', 'sub_b', 'sub_c', 'sub_d']
df['classes'] = df[sub_cols].apply(lambda x: ', '.join(df[sub_cols].columns[x == 1]), axis=1)

Upvotes: 0

Izaak Cornelis
Izaak Cornelis

Reputation: 322

You indeed want to iterate through the rows. However, you can not directly add the classes to the DataFrame as all columns of the DataFrame need to be equally long. So the trick is to first generate the column and then add it later:

subjects = ['subj_a', 'subj_b', 'subj_c']
classes_per_student [] # the empty column

for _, student in df.iterrows():
    # first create a list of the classes taken by this student
    classes = [subj for subj in subjects if student[subj]]
    # create a single string
    classes = ', '.join(classes)  
    # append to the column under construction
    classes_per_student.append(classes)

# and finaly add the column to the DataFrame
df['classes'] = classes_per_student

Upvotes: 0

Related Questions