Pandas dataframe: Expanding column into rows plus incremental numbering

Question

I need to expand a single row of my Pandas dataframe intwo multiple rows based on splitting the score column (containing match results) based on the spaces.

This is what the data looks like

A   B   score
1   2   6-1 6-2
3   4   6-4 4-6 6-3

To achieve the goal I used the approach from here.

With slightly adapting the approach, my dataframe looks like this:

A   B   score           sets
1   2   6-1 6-2         6-1
1   2   6-1 6-2         6-2
3   4   6-4 4-6 6-3     6-4
3   4   6-4 4-6 6-3     4-6
3   4   6-4 4-6 6-3     6-3

However, I would also like to have another additional column which represents the number of the set per match. It is like a cumulative count of the sets per match. My question is, how can the above linked solution be changed in order to get the desired result which looks as follows:

A   B   score           sets    setnumber
1   2   6-1 6-2         6-1     1
1   2   6-1 6-2         6-2     2
3   4   6-4 4-6 6-3     6-4     1
3   4   6-4 4-6 6-3     4-6     2
3   4   6-4 4-6 6-3     6-3     3

I think somewhere in the following code lines an adaption needs to be done, but I couldn't figure out, yet, how it should work:

s = df['score'].str.split(' ').apply(pd.Series, 1).stack()
s.index = s.index.droplevel(-1) # to line up with df's index

Zero · Accepted Answer

You can use repeat and then cumcount

In [2915]: dff = df.set_index(['A', 'B'])['score'].repeat(
                            df['score'].str.split(' ').str.len()
                                 ).reset_index()

In [2916]: dff
Out[2916]:
   A  B        score
0  1  2      6-1 6-2
1  1  2      6-1 6-2
2  3  4  6-4 4-6 6-3
3  3  4  6-4 4-6 6-3
4  3  4  6-4 4-6 6-3

In [2917]: dff.assign(setnumber=dff.groupby(['A', 'B']).cumcount()+1)
Out[2917]:
   A  B        score  setnumber
0  1  2      6-1 6-2          1
1  1  2      6-1 6-2          2
2  3  4  6-4 4-6 6-3          1
3  3  4  6-4 4-6 6-3          2
4  3  4  6-4 4-6 6-3          3

You could also get dff with .loc

In [2923]: df.loc[df.index.repeat(df['score'].str.split(' ').str.len())]
Out[2923]:
   A  B        score
0  1  2      6-1 6-2
0  1  2      6-1 6-2
1  3  4  6-4 4-6 6-3
1  3  4  6-4 4-6 6-3
1  3  4  6-4 4-6 6-3

Pandas dataframe: Expanding column into rows plus incremental numbering

Answers (1)

Related Questions