Reputation: 39
Let say I have a dataframe as follows:
index | col 1 | col2 |
---|---|---|
0 | a01 | a02 |
1 | a11 | a12 |
2 | a21 | a22 |
I want to duplicate a row by n times and then insert the duplicated rows at certain index. For e.g. duplicating row 0 by 2 times and then inserting before original row 0:
index | col 1 | col2 |
---|---|---|
0 | a01 | a02 |
1 | a01 | a02 |
2 | a01 | a02 |
3 | a11 | a12 |
4 | a21 | a22 |
What I'm doing now is creating an empty dataframe and then filling it with the values of the row that I want duplicated.
# create empty dataframe with 2 rows
temp_df = pd.DataFrame(columns=original_df.columns, index=list(range(2)))
# replacing each column by target value, I don't know how to do this more efficiently
temp_df.iloc[:,0] = original_df.iloc[0,0]
temp_df.iloc[:,1] = original_df.iloc[0,1]
temp_df.iloc[:,2] = original_df.iloc[0,2]
# concat the dataframes together
# to insert at indexes in the middle, I would have to slice the original_df and concat separately
original_df = pd.concat([temp_df, original_df])
This seems like a terribly obtuse way to do something I presume should be quite simple. How should I accomplish this more easily?
Upvotes: 2
Views: 6929
Reputation: 16162
This could work. Reset the index to a column so you can use that for sorting at the end. Take the row you want and concat it to the original df using np.reapeat then sort on the index col, drop it, and reset the index.
import pandas as pd
import numpy as np
df = pd.DataFrame({'index': [0, 1, 2],
'col 1': ['a01', 'a11', 'a21'],
'col2': ['a02', 'a12', 'a22']})
index_to_copy = 0
number_of_extra_copies = 2
pd.concat([df,
pd.DataFrame(np.repeat(df.iloc[[index_to_copy]].values,
number_of_extra_copies,
axis=0),
columns=df.columns)]).sort_values(by='index').drop(columns='index').reset_index(drop=True)
Output
col 1 col2
0 a01 a02
1 a01 a02
2 a01 a02
3 a11 a12
4 a21 a22
Upvotes: 1