Reputation: 61

Duplicating columns in pandas dataframe

I'm looking for a way to duplicate all columns in a dataframe, and have the duplicated column as the original name with a '_2' on the end.

Example:

d = {'col1': [1, 2], 'col2': [3, 4]}
start_df = pd.DataFrame(data=d)

d2 = {'col1':[1,2],'col1_2':[1,2],'col2':[3,4],'col2_2':[3,4]}
end_df = pd.DataFrame(data=d2)

Thanks.

Upvotes: 0

Answers (4)

hannez

Reputation: 31

Adding to Akmal Soliev's answer: If you want the duplicated columns directly after each original column, you have to adjust his code as following:

import pandas as pd

d = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data=d)    

for i, col in enumerate(df.columns):
    df.insert(df.columns.get_loc(col)+1, col+'_2', "")

df

Upvotes: 1

Akmal Soliev

Reputation: 742

Use .insert() function:

import pandas as pd

d = {'col1': [1, 2], 'col2': [3, 4]}
start_df = pd.DataFrame(data=d)

for i, col in enumerate(start_df.columns):
    start_df.insert(i+1, col+'_2', start_df[col])
start_df

output:

Out[1]:
   col1  col1_2  col2_2  col2
0     1       1       3     3
1     2       2       4     4

Upvotes: 1

mozway

Reputation: 262519

NB. this answer demonstrates a generalization of the process

Without any loop for generating the dataframe, you can simple use the repeat method of the columns index.

Then you can set columns names programmatically with a list comprehension.

For 2 repeats:

end_df = start_df[start_df.columns.repeat(2)]
end_df.columns = [f'{a}{b}' for a in start_df for b in ('', '_2')]

output:

   col1  col1_2  col2  col2_2
0     1       1     3       3
1     2       2     4       4

Generalization:

n = 5

end_df = start_df[start_df.columns.repeat(n)]
end_df.columns = [f'{a}{b}' for a in start_df
                            for b in ['']+[f'_{x+1}' for x in range(1,n)]]

Example n=5:

   col1  col1_2  col1_3  col1_4  col1_5  col2  col2_2  col2_3  col2_4  col2_5
0     1       1       1       1       1     3       3       3       3       3
1     2       2       2       2       2     4       4       4       4       4

Upvotes: 1

Gedas Miksenas

Reputation: 1059

Try this:

d = {'col1': [1, 2], 'col2': [3, 4]}
start_df = pd.DataFrame(data = d)

for column in start_df.columns:
    start_df[column + '_2'] = start_df[column]

Upvotes: 1

Duplicating columns in pandas dataframe

Answers (4)

Related Questions