Reputation: 61
I have a csv file that has columns name
, sub_a
, sub_b
, sub_c
, sub_d
, segment
and gender
. I would like create a new column classes
with all the classes (sub
-columns) seperated by comma that each student takes.
What would be the easiest way to accomplish this?
The result dataframe should look like this:
+------+-------+-------+-------+-------+---------+--------+---------------------+
| name | sub_a | sub_b | sub_c | sub_d | segment | gender | classes |
+------+-------+-------+-------+-------+---------+--------+---------------------+
| john | 1 | 1 | 0 | 1 | 1 | 0 | sub_a, sub_b, sub_d |
+------+-------+-------+-------+-------+---------+--------+---------------------+
| mike | 1 | 0 | 1 | 1 | 0 | 0 | sub_a, sub_c, sub_d |
+------+-------+-------+-------+-------+---------+--------+---------------------+
| mary | 1 | 1 | 0 | 1 | 1 | 1 | sub_a, sub_b, sub_d |
+------+-------+-------+-------+-------+---------+--------+---------------------+
| fred | 1 | 0 | 1 | 0 | 0 | 0 | sub_a, sub_c |
+------+-------+-------+-------+-------+---------+--------+---------------------+
Upvotes: 5
Views: 971
Reputation: 7723
You can use apply
with axis=1
For Ex.: if your dataframe like
df
A_a A_b B_b B_c
0 1 0 0 1
1 0 1 0 1
2 1 0 1 0
you can do
df['classes'] = df.apply(lambda x: ', '.join(df.columns[x==1]), axis = 1)
df
A_a A_b B_b B_c classes
0 1 0 0 1 A_a, B_c
1 0 1 0 1 A_b, B_c
2 1 0 1 0 A_a, B_b
To apply
on specific columns you can filter first using loc
#for your sample data
df_ = df.loc[:,'sub_a':'sub_d'] #or df.loc[:,'sub_a', 'sub_b', 'sub_c', 'sub_d']
df_.apply(lambda x: ', '.join(df_.columns[x==1]), axis = 1)
Upvotes: 1
Reputation: 323366
Let us try dot
s=df.filter(like='sub')
df['classes']=s.astype(bool).dot(s.columns+',').str[:-1]
Upvotes: 2
Reputation: 1902
You can use apply
only on the sub
-columns to apply a lambda function that will join the names of the sub
-columns where the values of the columns equal 1:
sub_cols = ['sub_a', 'sub_b', 'sub_c', 'sub_d']
df['classes'] = df[sub_cols].apply(lambda x: ', '.join(df[sub_cols].columns[x == 1]), axis=1)
Upvotes: 0
Reputation: 322
You indeed want to iterate through the rows. However, you can not directly add the classes to the DataFrame as all columns of the DataFrame need to be equally long. So the trick is to first generate the column and then add it later:
subjects = ['subj_a', 'subj_b', 'subj_c']
classes_per_student [] # the empty column
for _, student in df.iterrows():
# first create a list of the classes taken by this student
classes = [subj for subj in subjects if student[subj]]
# create a single string
classes = ', '.join(classes)
# append to the column under construction
classes_per_student.append(classes)
# and finaly add the column to the DataFrame
df['classes'] = classes_per_student
Upvotes: 0