Reputation: 33

repeat a row in pandas n times based on the value of other two columns

I have a pandas data frame with 4 columns, A, B, C, D:

A   B   C   D
1   0   2   ["apple", "pear", "peach"]
2   2   3   ["cherry"]
3   3   3   ["banana", "cherry"]
4   4   7   []
5   1   3   ["apple", "grapes"]

I want to get the following results by repeating each row n times, where n equals to the value of (column C - column B), and updating the value in column B with the updated value (+1 for each repeat).

A   B   C   D
1   0   2   ["apple", "pear", "peach"]
1   1   2   ["apple", "pear", "peach"]
1   2   2   ["apple", "pear", "peach"]
2   2   3   ["cherry"]
2   3   3   ["cherry"]
3   3   3   ["banana", "cherry"]
4   4   7   []
4   5   7   []
4   6   7   []
4   7   7   []
5   1   3   ["apple", "grapes"]
5   2   3   ["apple", "grapes"]
5   3   3   ["apple", "grapes"]

Not sure in pandas how could I achieve this? Thanks.

edited Column D may not be unique to each B value, so group by may not work here.

Upvotes: 2

Answers (2)

Joe Ferndz

Reputation: 8508

You can do index.repeat and groupby.cumcount.

import pandas as pd
c = ['A','B','C','D']
d = [[1,   0,   2,   ["apple", "pear", "peach"]],
     [2,   2,   3,   ["cherry"]],
     [3,   3,   3,   ["banana", "cherry"]],
     [4,   4,   7,   []],
     [5,   1,   3,   ["apple", "grapes"]]]
df = pd.DataFrame(d,columns=c)
df = df.loc[df.index.repeat(df['C']-df['B']+1)]
df['B'] = df['B']+df.groupby(['A']).cumcount()
print (df)

Output is:

   A  B  C                     D
0  1  0  2  [apple, pear, peach]
0  1  1  2  [apple, pear, peach]
0  1  2  2  [apple, pear, peach]
1  2  2  3              [cherry]
1  2  3  3              [cherry]
2  3  3  3      [banana, cherry]
3  4  4  7                    []
3  4  5  7                    []
3  4  6  7                    []
3  4  7  7                    []
4  5  1  3       [apple, grapes]
4  5  2  3       [apple, grapes]
4  5  3  3       [apple, grapes]

See stackoverflow posts for more details:

Grouping and auto increment based on columns in pandas

How to add sequential counter column on groups using Pandas groupby

Upvotes: 1

Mayank Porwal

Reputation: 34086

Use Index.repeat with Groupby.cumcount:

In [1322]: x = df.loc[df.index.repeat(df['C'] - df['B'] + 1)]
In [1324]: x['B'] = x.groupby('D').cumcount() + x['B']

In [1325]: x
Out[1325]: 
   A  B  C                         D
0  1  0  2  ["apple","pear","peach"]
0  1  1  2  ["apple","pear","peach"]
0  1  2  2  ["apple","pear","peach"]
1  2  2  3                ["cherry"]
1  2  3  3                ["cherry"]
2  3  3  3       ["banana","cherry"]
3  4  4  7                        []
3  4  5  7                        []
3  4  6  7                        []
3  4  7  7                        []
4  5  1  3        ["apple","grapes"]
4  5  2  3        ["apple","grapes"]
4  5  3  3        ["apple","grapes"]

Upvotes: 2

repeat a row in pandas n times based on the value of other two columns

Answers (2)

Related Questions