user27976
user27976

Reputation: 903

dealing with multiple values in a column using python

I'm new to pandas and pivot functions in python and need some help.

How can I use python pandas or pivot table to covert this table to the output below:

No  class
1   U234
2   U234,U237,U239
3   U238,U239,U234
4   U237,U234

Required output:

No  class
1   U234
2   U234
2   U237
2   U239
3   U238
3   U239
3   U234
4   U237
4   U234

Upvotes: 1

Views: 1046

Answers (1)

jezrael
jezrael

Reputation: 863236

You can first split column class to DataFrame, stack it and create new Series with reset_index and rename:

print (df['class']
             .str
             .split(',', expand=True)
             .stack()
             .reset_index(drop=True, level=1)
             .rename('class'))
0    U234
0    U237
1    U234
1    U237
1    U239
2    U238
2    U239
2    U234
3    U237
3    U234
Name: class, dtype: object

Then drop original column class and join new Series:

print (df.drop('class', axis=1)
             .join
             (
             df['class']
             .str
             .split(',', expand=True)
             .stack()
             .reset_index(drop=True, level=1)
             .rename('class')           
             ))

   No class
0   1  U234
0   1  U237
1   2  U234
1   2  U237
1   2  U239
2   3  U238
2   3  U239
2   3  U234
3   4  U237
3   4  U234

If in column class in NO NaN values:

print (pd.DataFrame([ x.split(',') for x in df['class'].tolist() ])
         .stack()
         .reset_index(drop=True, level=1)
         .rename('class'))
0    U234
0    U237
1    U234
1    U237
1    U239
2    U238
2    U239
2    U234
3    U237
3    U234
Name: class, dtype: object

Upvotes: 2

Related Questions