pinegulf
pinegulf

Reputation: 1396

Duplicate row and add string

I wish to duplicate Pandas data row and add string to end while keeping rest of data intact:

I_have = pd.DataFrame({'id':['a','b','c'], 'my_data' = [1,2,3])

I want:

Id     my_data
a      1
a_dup1 1
a_dup2 1
b      2
b_dup1 2
b_dup2 2
c      3
c_dup1 3
c_dup2 3

I could do this by 1) iterrows() or 2) 3x copies of existing data and appending, but hopefully there is more pythonic way to do this.

This seems to work:

tmp1 = I_have.copy(deep=True)
tmp2 = I_have.copy(deep=True)

tmp1['id'] = tmp1['id']+'_dup1'
tmp2['id'] = tmp2['id']+'_dup2'

pd.concat([I_have, tmp1, tmp2])

Upvotes: 1

Views: 114

Answers (1)

jezrael
jezrael

Reputation: 862521

Use Index.repeat with DataFrame.loc for duplicated rows and then add counter by numpy.tile, last add substrings for duplicated values - not equal 0 in Series.mask:

N = 3
df = df.loc[df.index.repeat(N)].reset_index(drop=True)

a = np.tile(np.arange(N), N)

df['id'] = df['id'].mask(a != 0, df['id'] + '_dup' + a.astype(str))

#alternative solution
#df.loc[a != 0, 'id'] = df['id'] + '_dup' + a.astype(str)

print (df)
       id  my_data
0       a        1
1  a_dup1        1
2  a_dup2        1
3       b        2
4  b_dup1        2
5  b_dup2        2
6       c        3
7  c_dup1        3
8  c_dup2        3

Upvotes: 1

Related Questions