programmer
programmer

Reputation: 33

Converting the comma separated values in a cell to different rows by duplicating the other column contents into every row Python

I have a dataset

Name    Type        Cluster      Value
ABC     AA,BB       AZ,YZ        15
LMN     CC,DD,EE    LM,LM,LM     20

with many other columns.

I want to convert it to a dataframe like:

Name    Type        Cluster     Value    TypeSubset    ClusterSubset
ABC     AA, BB      AZ, YZ       15       AA            AZ 
ABC     AA, BB      AZ, YZ       15       BB            YZ
LMN     CC,DD,EE    LM,LM,LM     20       CC            LM
LMN     CC,DD,EE    LM,LM,LM     20       DD            LM 
LMN     CC,DD,EE    LM,LM,LM     20       EE            LM

The dataframe can have many columns. But the Number of elements in Type and Cluster will be same. I just want them separated into different rows and duplicate all the other columns.

How can I do it in python.

I tried

df.set_index(['Type','Cluster'])
   .apply(lambda x: x.astype(str).str.split(',').explode())
   .reset_index())  ```

Not getting the desired result.

Upvotes: 1

Views: 744

Answers (1)

mozway
mozway

Reputation: 262634

assign new columns and explode in parallel.

(df.assign(TypeSubset=df['Type'].str.split(','),
           ClusterSubset=df['Cluster'].str.split(',')
           )
   .explode(['TypeSubset', 'ClusterSubset'])
)

Upvotes: 2

Related Questions