Reputation: 199
I am trying to explode an existing dataframe based on a numeric value in a column. For example, if the column has a numeric value of 3, I want to have 3 of those rows, so on and so on.
Assuming we start with this dataframe:
inventory_partner inventory_partner2 calc
0 A1 aa 1
1 A2 bb 2
2 A3 cc 5
3 A4 dd 4
4 A5 ee 5
5 A6 ff 3
How do we get to this dataframe?
inventory_partner inventory_partner2 calc
0 A1 aa 1
1 A2 bb 2
1 A2 bb 2
2 A3 cc 5
2 A3 cc 5
2 A3 cc 5
2 A3 cc 5
2 A3 cc 5
3 A4 dd 4
3 A4 dd 4
3 A4 dd 4
3 A4 dd 4
4 A5 ee 5
4 A5 ee 5
4 A5 ee 5
4 A5 ee 5
4 A5 ee 5
5 A6 ff 3
5 A6 ff 3
5 A6 ff 3
I have gotten this to work by using the below code, but I was wondering if there is an easier way to accomplish this without having to manually create the comma separated lists to feed into the explode method.
import pandas as pd
#create dataframe
d = {'inventory_partner': ['A1', 'A2', 'A3', 'A4', 'A5', 'A6'], 'inventory_partner2': ['aa', 'bb', 'cc', 'dd', 'ee', 'ff'], 'calc': [1, 2, 5, 4, 5, 3]}
df1 = pd.DataFrame(data=d)
print(df1) #print original dataframe
#create my_comma_list column based on number values in calc column
df1.insert(3, 'my_comma_list', '')
df1.loc[df1['calc'] == 1, 'my_comma_list'] = '1'
df1.loc[df1['calc'] == 2, 'my_comma_list'] = '1, 2'
df1.loc[df1['calc'] == 3, 'my_comma_list'] = '1, 2, 3'
df1.loc[df1['calc'] == 4, 'my_comma_list'] = '1, 2, 3, 4'
df1.loc[df1['calc'] == 5, 'my_comma_list'] = '1, 2, 3, 4, 5'
print(df1) #print before row explosion
#explode the rows using the my_comma_list column to get desired number of rows
df1 = df1.assign(my_comma_list=df1['my_comma_list'].str.split(',')).explode('my_comma_list')
#drop the my_comma_list column since we no longer need it
del df1['my_comma_list']
print(df1) #print after row explosion
Upvotes: 4
Views: 1136
Reputation: 8508
You can use Index.repeat
and DataFrame.loc
to repeat rows.
import pandas as pd
#create dataframe
d = {'inventory_partner': ['A1', 'A2', 'A3', 'A4', 'A5', 'A6'],
'inventory_partner2': ['aa', 'bb', 'cc', 'dd', 'ee', 'ff'],
'calc': [1, 2, 5, 4, 5, 3]}
df1 = pd.DataFrame(data=d)
print (df1)
df1 = df1.loc[df1.index.repeat(df1['calc'])]
print (df1)
Output is:
Original DataFrame:
inventory_partner inventory_partner2 calc
0 A1 aa 1
1 A2 bb 2
2 A3 cc 5
3 A4 dd 4
4 A5 ee 5
5 A6 ff 3
Updated DataFrame with repeated rows:
inventory_partner inventory_partner2 calc
0 A1 aa 1
1 A2 bb 2
1 A2 bb 2
2 A3 cc 5
2 A3 cc 5
2 A3 cc 5
2 A3 cc 5
2 A3 cc 5
3 A4 dd 4
3 A4 dd 4
3 A4 dd 4
3 A4 dd 4
4 A5 ee 5
4 A5 ee 5
4 A5 ee 5
4 A5 ee 5
4 A5 ee 5
5 A6 ff 3
5 A6 ff 3
5 A6 ff 3
If you want to repeat rows based on a column value with a reference lookup, you can create a dictionary and identify how many times you want it to repeat, then use map to pass the value.
Let's say, you want to repeat based on the value in inventory_partner
. Then you can do this:
import pandas as pd
inv_partner_dict = {'A1':1, 'A2':2, 'A3':5, 'A4':4,'A5':5,'A6':3}
#create dataframe
d = {'inventory_partner': ['A1', 'A2', 'A3', 'A4', 'A5', 'A6'],
'inventory_partner2': ['aa', 'bb', 'cc', 'dd', 'ee', 'ff'],
'calc': [1, 2, 5, 4, 5, 3]}
df1 = pd.DataFrame(data=d)
print (df1)
df1 = df1.loc[df1.index.repeat(df1['inventory_partner2'].map(inv_partner_dict))]
print (df1)
This will do the same thing.
The output of this will be:
Original DataFrame:
inventory_partner inventory_partner2 calc
0 A1 aa 1
1 A2 bb 2
2 A3 cc 5
3 A4 dd 4
4 A5 ee 5
5 A6 ff 3
Updated DataFrame with repeated rows:
inventory_partner inventory_partner2 calc
0 A1 aa 1
1 A2 bb 2
1 A2 bb 2
2 A3 cc 5
2 A3 cc 5
2 A3 cc 5
2 A3 cc 5
2 A3 cc 5
3 A4 dd 4
3 A4 dd 4
3 A4 dd 4
3 A4 dd 4
4 A5 ee 5
4 A5 ee 5
4 A5 ee 5
4 A5 ee 5
4 A5 ee 5
5 A6 ff 3
5 A6 ff 3
5 A6 ff 3
Upvotes: 5
Reputation: 28644
Use pd.Series.repeat
to get the index and then reindex
:
df.reindex(df.inventory_partner2.repeat(df.calc).index)
Upvotes: 1