Nash Vlasov
Nash Vlasov

Reputation: 67

How can one duplicate columns N times in DataFrame?


I have a dataframe with one column and I would like to get a Dataframe with N columns all of which will be identical to the first one. I can simply duplicate it by:

df[['new column name']] = df[['column name']]

but I have to make more than 1000 identical columns that's why it doesnt work . One important thing is figures in columns should change for instance if first columns is 0 the nth column is n and the previous is n-1

Upvotes: 5

Views: 3581

Answers (4)

ansev
ansev

Reputation: 30930

I think the most efficient is to index with DataFrame.loc instead of using an outer loop

n = 3
new_df = df.loc[:, ['column_duplicate']*n + 
                   df.columns.difference(['column_duplicate']).tolist()]
print(new_df)

   column_duplicate  column_duplicate  column_duplicate  other
0                 0                 0                 0     10
1                 1                 1                 1     11
2                 2                 2                 2     12
3                 3                 3                 3     13
4                 4                 4                 4     14
5                 5                 5                 5     15
6                 6                 6                 6     16
7                 7                 7                 7     17
8                 8                 8                 8     18
9                 9                 9                 9     19

If you want add a suffix

suffix_tup = ('a', 'b', 'c')

not_dup_cols = df.columns.difference(['column_duplicate']).tolist()

new_df = (df.loc[:, ['column_duplicate']*len(suffix_tup) + 
                    not_dup_cols]
            .set_axis(list(map(lambda suffix: f'column_duplicate_{suffix}', 
                               suffix_tup)) + 
                      not_dup_cols, axis=1)
         )
print(new_df)


   column_duplicate_a  column_duplicate_b  column_duplicate_c  other
0                   0                   0                   0     10
1                   1                   1                   1     11
2                   2                   2                   2     12
3                   3                   3                   3     13
4                   4                   4                   4     14
5                   5                   5                   5     15
6                   6                   6                   6     16
7                   7                   7                   7     17
8                   8                   8                   8     18

or add an index

n = 3
not_dup_cols = df.columns.difference(['column_duplicate']).tolist()

new_df = (df.loc[:, ['column_duplicate']*n + 
                    not_dup_cols]
            .set_axis(list(map(lambda x: f'column_duplicate_{x}', range(n))) + 
                      not_dup_cols, axis=1)
         )
print(new_df)

   column_duplicate_0  column_duplicate_1  column_duplicate_2  other
0                   0                   0                   0     10
1                   1                   1                   1     11
2                   2                   2                   2     12
3                   3                   3                   3     13
4                   4                   4                   4     14
5                   5                   5                   5     15
6                   6                   6                   6     16
7                   7                   7                   7     17
8                   8                   8                   8     18
9                   9                   9                   9     19

Upvotes: 1

cs95
cs95

Reputation: 402814

df

   A  B  C
0  x  x  x
1  y  x  z

Duplicate column "C" 5 times using df.assign:

n = 5
df2 = df.assign(**{f'C{i}': df['C'] for i in range(1, n+1)})
df2

   A  B  C C1 C2 C3 C4 C5
0  x  x  x  x  x  x  x  x
1  y  x  z  z  z  z  z  z

Set n to 1000 to get your desired output.


You can also directly assign the result back:

df[[f'C{i}' for i in range(1, n+1)]] = df[['C']*n].to_numpy()
df
 
   A  B  C C1 C2 C3 C4 C5
0  x  x  x  x  x  x  x  x
1  y  x  z  z  z  z  z  z

Upvotes: 1

Celius Stingher
Celius Stingher

Reputation: 18377

If it's a single column, you can use tranpose and then simply replicate them with pd.concat and tranpose back to the original format, this avoids looping and should be faster, then you can change the column names in a second line, but without dealing with all the data in the dataframe which would be the most consuming performance wise:

import pandas as pd
df = pd.DataFrame({'Column':[1,2,3,4,5]})

Original dataframe:

   Column
0       1
1       2
2       3
3       4
4       5
df = pd.concat([df.T]*1000).T

Output:

   Column  Column  Column  Column  ...  Column  Column  Column  Column
0       1       1       1       1  ...       1       1       1       1
1       2       2       2       2  ...       2       2       2       2
2       3       3       3       3  ...       3       3       3       3
3       4       4       4       4  ...       4       4       4       4
4       5       5       5       5  ...       5       5       5       5

[5 rows x 1000 columns]

df.columns = ['Column'+'_'+str(i) for i in range(1000)]

Upvotes: 4

sophocles
sophocles

Reputation: 13831

Say that you have a df:, with column name 'company_name' that consists of 8 companies:

df = {"company_name":{"0":"Telia","1":"Proximus","2":"Tmobile","3":"Orange","4":"Telefonica","5":"Verizon","6":"AT&T","7":"Koninklijke"}}

  company_name
0        Telia
1     Proximus
2      Tmobile
3       Orange
4   Telefonica
5      Verizon
6         AT&T
7  Koninklijke

You can use a loop and range to determine how many identical columns to create, and do:

for i in range(0,1000):
    df['company_name'+str(i)] = df['company_name']

which results in the shape of the df:

df.shape
(8, 1001)

i.e. it replicated 1000 times the same columns. The names of the duplicated columns will be the same as the original one, plus an integer (=+1) at the end:

'company_name', 'company_name0', 'company_name1', 'company_name2','company_name..N'

Upvotes: 1

Related Questions