Reputation: 4898
My question is similar to one asked here. I have a dataframe and I want to repeat each row of the dataframe k
number of times. Along with it, I also want to create a column with values 0
to k-1
. So
import pandas as pd
df = pd.DataFrame(data={
'id': ['A', 'B', 'C'],
'n' : [ 1, 2, 3],
'v' : [ 10, 13, 8]
})
what_i_want = pd.DataFrame(data={
'id': ['A', 'B', 'B', 'C', 'C', 'C'],
'n' : [ 1, 2, 2, 3, 3, 3],
'v' : [ 10, 13, 13, 8, 8, 8],
'repeat_id': [0, 0, 1, 0, 1, 2]
})
Command below does half of the job. I am looking for pandas way of adding the repeat_id
column.
df.loc[df.index.repeat(df.n)]
Upvotes: 7
Views: 1184
Reputation: 863751
Use GroupBy.cumcount
and copy
for avoid SettingWithCopyWarning
:
If you modify values in df1
later you will find that the modifications do not propagate back to the original data (df
), and that Pandas does warning.
df1 = df.loc[df.index.repeat(df.n)].copy()
df1['repeat_id'] = df1.groupby(level=0).cumcount()
df1 = df1.reset_index(drop=True)
print (df1)
id n v repeat_id
0 A 1 10 0
1 B 2 13 0
2 B 2 13 1
3 C 3 8 0
4 C 3 8 1
5 C 3 8 2
Upvotes: 4