Kasper
Kasper

Reputation: 121

Split multiple columns into new rows

I have a problem with splitting two columns into new rows.

My dataframe looks like this

enter image description here

"involved" and "team_player_formation" each have 15 strings, that needs to be assigned into each their row. With the first string of "involved" matching first string of "team_player_formation"

I have tried to follow this: Split (explode) pandas dataframe string entry to separate rows and searching for ways to split multiple columns, but without success

I have been able to split one of my columns with the following at the moment

df = pd.read_csv('Hello.csv',  delimiter=';')
df = df.assign(involved=df['involved'].str.split(',')).explode('involved')

Which gives me something like this:

     matchId                                contestantId periodId typeId    involved    team_formation  team_player_formation
    0   d5321qxmnyf9004i049uf4pre   77tfx9me4aaqhzv78bmgsy9bg   2   40  b2492j7qzdo7g3ysxz6gq4g5x   4   1
    0   d5321qxmnyf9004i049uf4pre   77tfx9me4aaqhzv78bmgsy9bg   2   40  b2492j7qzdo7g3ysxz6gq4g5x   4   1
    0   d5321qxmnyf9004i049uf4pre   77tfx9me4aaqhzv78bmgsy9bg   2   40  b2492j7qzdo7g3ysxz6gq4g5x   4   1
    0   d5321qxmnyf9004i049uf4pre   77tfx9me4aaqhzv78bmgsy9bg   2   40  b2492j7qzdo7g3ysxz6gq4g5x   4   1
    0   d5321qxmnyf9004i049uf4pre   77tfx9me4aaqhzv78bmgsy9bg   2   40  b2492j7qzdo7g3ysxz6gq4g5x   4   1
    ..................

    0   d5321qxmnyf9004i049uf4pre   77tfx9me4aaqhzv78bmgsy9bg   2   40  am3509ake84cde1xhb9264i22   4   0
    0   d5321qxmnyf9004i049uf4pre   77tfx9me4aaqhzv78bmgsy9bg   2   40  am3509ake84cde1xhb9264i22   4   0
    0   d5321qxmnyf9004i049uf4pre   77tfx9me4aaqhzv78bmgsy9bg   2   40  am3509ake84cde1xhb9264i22   4   0
    0   d5321qxmnyf9004i049uf4pre   77tfx9me4aaqhzv78bmgsy9bg   2   40  am3509ake84cde1xhb9264i22   4   0
    0   d5321qxmnyf9004i049uf4pre   77tfx9me4aaqhzv78bmgsy9bg   2   40  am3509ake84cde1xhb9264i22   4   0

But that only split "involed" into a new column.

The output should be something like this, where I have only showed the first 3 rows.

enter image description here

Thank you! I hope you can help, and that I explained it well enough.

Upvotes: 0

Views: 192

Answers (1)

Scott Boston
Scott Boston

Reputation: 153460

If I undstand your problem correctly, you can use this MCVE to help with your analysis.

df = pd.DataFrame({'ID':[1],'string1':['A,B,C'], 'string2':['X,Y,Z']}, index=[0])

df_joined = df.join([df['string1'].str.split(',', expand=True).add_prefix('s1_'), 
                     df['string2'].str.split(',', expand=True).add_prefix('s2_')])

pd.wide_to_long(df_joined, ['s1','s2'], 'ID', 'No', sep='_').reset_index()

Output:

   ID  No string1 string2 s1 s2
0   1   0   A,B,C   X,Y,Z  A  X
1   1   1   A,B,C   X,Y,Z  B  Y
2   1   2   A,B,C   X,Y,Z  C  Z

Upvotes: 2

Related Questions