Reputation: 5686
I need to expand a single row of my Pandas dataframe intwo multiple rows based on splitting the score
column (containing match results) based on the spaces.
This is what the data looks like
A B score
1 2 6-1 6-2
3 4 6-4 4-6 6-3
To achieve the goal I used the approach from here.
With slightly adapting the approach, my dataframe looks like this:
A B score sets
1 2 6-1 6-2 6-1
1 2 6-1 6-2 6-2
3 4 6-4 4-6 6-3 6-4
3 4 6-4 4-6 6-3 4-6
3 4 6-4 4-6 6-3 6-3
However, I would also like to have another additional column which represents the number of the set per match. It is like a cumulative count of the sets per match. My question is, how can the above linked solution be changed in order to get the desired result which looks as follows:
A B score sets setnumber
1 2 6-1 6-2 6-1 1
1 2 6-1 6-2 6-2 2
3 4 6-4 4-6 6-3 6-4 1
3 4 6-4 4-6 6-3 4-6 2
3 4 6-4 4-6 6-3 6-3 3
I think somewhere in the following code lines an adaption needs to be done, but I couldn't figure out, yet, how it should work:
s = df['score'].str.split(' ').apply(pd.Series, 1).stack()
s.index = s.index.droplevel(-1) # to line up with df's index
Upvotes: 0
Views: 75
Reputation: 76947
You can use repeat
and then cumcount
In [2915]: dff = df.set_index(['A', 'B'])['score'].repeat(
df['score'].str.split(' ').str.len()
).reset_index()
In [2916]: dff
Out[2916]:
A B score
0 1 2 6-1 6-2
1 1 2 6-1 6-2
2 3 4 6-4 4-6 6-3
3 3 4 6-4 4-6 6-3
4 3 4 6-4 4-6 6-3
In [2917]: dff.assign(setnumber=dff.groupby(['A', 'B']).cumcount()+1)
Out[2917]:
A B score setnumber
0 1 2 6-1 6-2 1
1 1 2 6-1 6-2 2
2 3 4 6-4 4-6 6-3 1
3 3 4 6-4 4-6 6-3 2
4 3 4 6-4 4-6 6-3 3
You could also get dff
with .loc
In [2923]: df.loc[df.index.repeat(df['score'].str.split(' ').str.len())]
Out[2923]:
A B score
0 1 2 6-1 6-2
0 1 2 6-1 6-2
1 3 4 6-4 4-6 6-3
1 3 4 6-4 4-6 6-3
1 3 4 6-4 4-6 6-3
Upvotes: 3