Reputation: 33
I have a pandas data frame with 4 columns, A, B, C, D:
A B C D
1 0 2 ["apple", "pear", "peach"]
2 2 3 ["cherry"]
3 3 3 ["banana", "cherry"]
4 4 7 []
5 1 3 ["apple", "grapes"]
I want to get the following results by repeating each row n times, where n equals to the value of (column C - column B), and updating the value in column B with the updated value (+1 for each repeat).
A B C D
1 0 2 ["apple", "pear", "peach"]
1 1 2 ["apple", "pear", "peach"]
1 2 2 ["apple", "pear", "peach"]
2 2 3 ["cherry"]
2 3 3 ["cherry"]
3 3 3 ["banana", "cherry"]
4 4 7 []
4 5 7 []
4 6 7 []
4 7 7 []
5 1 3 ["apple", "grapes"]
5 2 3 ["apple", "grapes"]
5 3 3 ["apple", "grapes"]
Not sure in pandas how could I achieve this? Thanks.
edited Column D may not be unique to each B value, so group by may not work here.
Upvotes: 2
Views: 595
Reputation: 8508
You can do index.repeat and groupby.cumcount.
import pandas as pd
c = ['A','B','C','D']
d = [[1, 0, 2, ["apple", "pear", "peach"]],
[2, 2, 3, ["cherry"]],
[3, 3, 3, ["banana", "cherry"]],
[4, 4, 7, []],
[5, 1, 3, ["apple", "grapes"]]]
df = pd.DataFrame(d,columns=c)
df = df.loc[df.index.repeat(df['C']-df['B']+1)]
df['B'] = df['B']+df.groupby(['A']).cumcount()
print (df)
Output is:
A B C D
0 1 0 2 [apple, pear, peach]
0 1 1 2 [apple, pear, peach]
0 1 2 2 [apple, pear, peach]
1 2 2 3 [cherry]
1 2 3 3 [cherry]
2 3 3 3 [banana, cherry]
3 4 4 7 []
3 4 5 7 []
3 4 6 7 []
3 4 7 7 []
4 5 1 3 [apple, grapes]
4 5 2 3 [apple, grapes]
4 5 3 3 [apple, grapes]
See stackoverflow posts for more details:
Grouping and auto increment based on columns in pandas
How to add sequential counter column on groups using Pandas groupby
Upvotes: 1
Reputation: 34046
Use Index.repeat
with Groupby.cumcount
:
In [1322]: x = df.loc[df.index.repeat(df['C'] - df['B'] + 1)]
In [1324]: x['B'] = x.groupby('D').cumcount() + x['B']
In [1325]: x
Out[1325]:
A B C D
0 1 0 2 ["apple","pear","peach"]
0 1 1 2 ["apple","pear","peach"]
0 1 2 2 ["apple","pear","peach"]
1 2 2 3 ["cherry"]
1 2 3 3 ["cherry"]
2 3 3 3 ["banana","cherry"]
3 4 4 7 []
3 4 5 7 []
3 4 6 7 []
3 4 7 7 []
4 5 1 3 ["apple","grapes"]
4 5 2 3 ["apple","grapes"]
4 5 3 3 ["apple","grapes"]
Upvotes: 2