Yash
Yash

Reputation: 105

Multi column explode in Pandas Python

This is my excel input

enter image description here

This is my expected output

I am expecting all possible combinations for all comma seperated values of each columns into separate rows

enter image description here

Current work

df = pd.DataFrame()
for file in files:
   if file.endswith('.xlsx'):
       df = df.append(pd.read_excel('downloads/' + file), ignore_index=True) 
df.head() 
df.to_excel(r'downloads/merged.xlsx')

df.type_c = df.type_c.str.split(',')
df1 = df.explode('type_c') 

df1.language_c = df1.language_c.str.split(',')
df1.explode('language_c')

Here I am exploding multiple columns, Can I get this done in single command, where it can do this exploding for all columns without specifying? OR should it run through a loop for all columns which has ',' in it?

Upvotes: 1

Views: 275

Answers (2)

robperch
robperch

Reputation: 45

Maybe this approach could assist you:

Generating working dataframe

  • Code

    dfx = pd.DataFrame(
        data={
            'scope': ['internal'],
            'type_c': ['bmm, pitcher'],
            'subtype_c': ['ad experiment'],
            'language_c': ['en, esp'],
        }
    )
    
  • Result

    dfx
    
        scope       type_c          subtype_c       language_c
    0   internal    bmm, pitcher    ad experiment   en, esp
    

Exploding rows based on entries that contain a comma

  • Code

    ## Iterable of all column names in dataframe
    cols = dfx.columns
    
    ## Looping through every column to split and make cross join
    for col in cols:
    
        ## Exploding a column into various columns using ',' as a separator
        dfx2 = dfx[col].str.split(pat=',', expand=True).T
    
        ## Renaming the obtained exploded result to match the original column name
        dfx2.rename(columns={0: col}, inplace=True)
    
        ## Dropping the processed column from the original dataframe
        dfx.drop(col, axis=1, inplace=True)
    
        ## Conducting a cross join between the 'exploded column' and the original dataframe
        dfx = pd.merge(
            left=dfx,
            right=dfx2,
            how='cross',
        )
    
    ## Ensuring that you only keep the columns from the original list
    ### Note: this is a 'hard-coded' solution to deal with the additional columns obtained with the cross join
    dfx = dfx.loc[:, cols].copy()
    
  • Result

    dfx
    
        scope   type_c          subtype_c       language_c
    0   internal    bmm         ad experiment   en
    1   internal    bmm         ad experiment   esp
    2   internal    pitcher     ad experiment   en
    3   internal    pitcher     ad experiment   esp
    

Hope this approach helps you!

Upvotes: 0

Matthew McDermott
Matthew McDermott

Reputation: 13

Can just make it a definition.

def explodePandas(files):
    global df, df1 # If needed
    for file in files:
        if file.endswitch('.xlsx'):
            df = df.append(pd.read_excel('downloads/' + file), ignore_index = True)
    df.head()

    df.to_excel(r'downloads/merged.xlsx')

    df.type_c = df.type_c.str.split(',')
    df1 = df.explode('type_c') 

    df1.language_c = df1.language_c.str.split(',')
    df1.explode('language_c')

explodePandas()

Upvotes: 1

Related Questions