Splitting a column into two new columns with same column name

Question

I want to split pairs of values, which are separated by a comma within each column, into two adjacent columns within a new data frame and with the same column name for each of the two new columns.

That is, I want to convert this:

A   B   C   D   E 
1,1 0,1 1,1 1,1 1,1 
1,1 1,1 1,1 1,1 1,1
0,1 0,1 0,1 0,1 0,1

to this:

A  A  B  B  C  C  D  D  E  E
1  1  0  1  1  1  1  1  1  1
1  1  1  1  1  1  1  1  1  1
0  1  0  1  0  1  0  1  0  1

If the data frame names can't have exact duplicates, A_1 and A_2... and so on should be ok. Or, having the names in the first row of the dataframe instead of as a header would be ok also.

My actual dataset is ~200 columns by ~13,000 rows, so I need an automated method for splitting columns and assigning names to the second version of the data frame.

Rich Scriven · Accepted Answer

You could use

library(splitstackshape)
(newdf <- cSplit(df, names(df), ","))
#    A_1 A_2 B_1 B_2 C_1 C_2 D_1 D_2 E_1 E_2
# 1:   1   1   0   1   1   1   1   1   1   1
# 2:   1   1   1   1   1   1   1   1   1   1
# 3:   0   1   0   1   0   1   0   1   0   1

To create duplicate column names, you can then do the following since data.table is also loaded with splitstackshape

setnames(newdf, names(newdf), sub("_.*", "", names(newdf)))
newdf
#    A A B B C C D D E E
# 1: 1 1 0 1 1 1 1 1 1 1
# 2: 1 1 1 1 1 1 1 1 1 1
# 3: 0 1 0 1 0 1 0 1 0 1

But just so you know, having duplicate column names is a terrible idea.

Splitting a column into two new columns with same column name

Answers (2)

Related Questions