Reputation: 1396
I wish to duplicate Pandas data row and add string to end while keeping rest of data intact:
I_have = pd.DataFrame({'id':['a','b','c'], 'my_data' = [1,2,3])
I want:
Id my_data
a 1
a_dup1 1
a_dup2 1
b 2
b_dup1 2
b_dup2 2
c 3
c_dup1 3
c_dup2 3
I could do this by 1) iterrows() or 2) 3x copies of existing data and appending, but hopefully there is more pythonic way to do this.
This seems to work:
tmp1 = I_have.copy(deep=True)
tmp2 = I_have.copy(deep=True)
tmp1['id'] = tmp1['id']+'_dup1'
tmp2['id'] = tmp2['id']+'_dup2'
pd.concat([I_have, tmp1, tmp2])
Upvotes: 1
Views: 114
Reputation: 862521
Use Index.repeat
with DataFrame.loc
for duplicated rows and then add counter by numpy.tile
, last add substrings for duplicated values - not equal 0
in Series.mask
:
N = 3
df = df.loc[df.index.repeat(N)].reset_index(drop=True)
a = np.tile(np.arange(N), N)
df['id'] = df['id'].mask(a != 0, df['id'] + '_dup' + a.astype(str))
#alternative solution
#df.loc[a != 0, 'id'] = df['id'] + '_dup' + a.astype(str)
print (df)
id my_data
0 a 1
1 a_dup1 1
2 a_dup2 1
3 b 2
4 b_dup1 2
5 b_dup2 2
6 c 3
7 c_dup1 3
8 c_dup2 3
Upvotes: 1