Reputation: 105
This is my excel input
This is my expected output
I am expecting all possible combinations for all comma seperated values of each columns into separate rows
Current work
df = pd.DataFrame()
for file in files:
if file.endswith('.xlsx'):
df = df.append(pd.read_excel('downloads/' + file), ignore_index=True)
df.head()
df.to_excel(r'downloads/merged.xlsx')
df.type_c = df.type_c.str.split(',')
df1 = df.explode('type_c')
df1.language_c = df1.language_c.str.split(',')
df1.explode('language_c')
Here I am exploding multiple columns, Can I get this done in single command, where it can do this exploding for all columns without specifying? OR should it run through a loop for all columns which has ',' in it?
Upvotes: 1
Views: 275
Reputation: 45
Maybe this approach could assist you:
Code
dfx = pd.DataFrame(
data={
'scope': ['internal'],
'type_c': ['bmm, pitcher'],
'subtype_c': ['ad experiment'],
'language_c': ['en, esp'],
}
)
Result
dfx
scope type_c subtype_c language_c
0 internal bmm, pitcher ad experiment en, esp
Code
## Iterable of all column names in dataframe
cols = dfx.columns
## Looping through every column to split and make cross join
for col in cols:
## Exploding a column into various columns using ',' as a separator
dfx2 = dfx[col].str.split(pat=',', expand=True).T
## Renaming the obtained exploded result to match the original column name
dfx2.rename(columns={0: col}, inplace=True)
## Dropping the processed column from the original dataframe
dfx.drop(col, axis=1, inplace=True)
## Conducting a cross join between the 'exploded column' and the original dataframe
dfx = pd.merge(
left=dfx,
right=dfx2,
how='cross',
)
## Ensuring that you only keep the columns from the original list
### Note: this is a 'hard-coded' solution to deal with the additional columns obtained with the cross join
dfx = dfx.loc[:, cols].copy()
Result
dfx
scope type_c subtype_c language_c
0 internal bmm ad experiment en
1 internal bmm ad experiment esp
2 internal pitcher ad experiment en
3 internal pitcher ad experiment esp
Hope this approach helps you!
Upvotes: 0
Reputation: 13
Can just make it a definition.
def explodePandas(files):
global df, df1 # If needed
for file in files:
if file.endswitch('.xlsx'):
df = df.append(pd.read_excel('downloads/' + file), ignore_index = True)
df.head()
df.to_excel(r'downloads/merged.xlsx')
df.type_c = df.type_c.str.split(',')
df1 = df.explode('type_c')
df1.language_c = df1.language_c.str.split(',')
df1.explode('language_c')
explodePandas()
Upvotes: 1