Nick Knauer
Nick Knauer

Reputation: 4243

Reshape values from 1 column and attach to existing column name

I have a dataset that looks like this:

Col1     Col2    Col3   Col4    Col5
   A        1       1     10      90
   A        1       2     20     100 
   A        1       3     30     110 
   A        1       4     40     120
   B        2       1     50     130
   B        2       2     60     140 
   B        2       3     70     150
   B        2       4     80     160

How do I reshape this so that I have the value from Col1 be in the column name of all columns from Col4 and on? My actual dataset has like 20 columns.

I want my final output to look like this:

Col2    Col3   Col4_A    Col4_B      Col5_A      Col5_B
   1       1       10        NA          90          NA
   1       2       20        NA         100          NA
   1       3       30        NA         110          NA
   1       4       40        NA         120          NA 
   2       1       NA        50          NA         130
   2       2       NA        60          NA         140
   2       3       NA        70          NA         150
   2       4       NA        80          NA         160

Upvotes: 4

Views: 91

Answers (3)

lmo
lmo

Reputation: 38520

Here is a base R method with model.matrix and sub.

# construct desired data.frame
dat <- as.data.frame(model.matrix(~Col2 + Col3 + Col1:(Col4 + Col5) - 1, df1))
# construct desired names
names(dat) <- sub("^.*(.):(.*)$", "\\2\\1", names(dat))

Here, the formula in model.matrix is used to build out your desired structure. The -1 at the end assures that each level of the interaction appears. Because the function returns a matrix, use as.data.frame to convert it. Then use sub and the regular expression to alter the names of the interaction variables. This returns

dat
  Col2 Col3 Col4A Col4B Col5A Col5B
1    1    1    10     0    90     0
2    1    2    20     0   100     0
3    1    3    30     0   110     0
4    1    4    40     0   120     0
5    1    1     0    50     0   130
6    1    2     0    60     0   140
7    1    3     0    70     0   150
8    1    4     0    80     0   160

Note that if you are planning to use this directly in a model, you probably don't want to convert it to a data.frame. In the case that it stays a matrix, replace names(dat) with colnames(dat).

Use sub("^.*(.):(.*)$", "\\2_\\1", names(dat)) to include the underscores.

To make the first part more dynamic, you can create the formula to the model.matrix function with a function, like this:

f <- function(x) as.formula(paste0("~ Col2 + Col3 + Col1:(",
                                   paste(paste0("Col", x), collapse=" + "), ") -1"))

Then try it,

f(4:7)
~Col2 + Col3 + Col1:(Col4 + Col5 + Col6 + Col7) - 1
<environment: 0x3d2b598>

Upvotes: 1

Sathish
Sathish

Reputation: 12723

using data table. I am showing solution for two different data posted in the question.

library('data.table')

Data-1:

df1 <- read.table(text='Col1     Col2    Col3   Col4    Col5
A        1       1     10      90
                  A        1       2     20     100 
                  A        1       3     30     110 
                  A        1       4     40     120
                  B        1       1     50     130
                  B        1       2     60     140 
                  B        1       3     70     150
                  B        1       4     80     160', header = TRUE)

setDT(df1)
value.var <- names(df1)[!names(df1) %in% c('Col1', 'Col2', 'Col3')]
dcast(df1, Col2 + Col3 ~ Col1, value.var = value.var )
#    Col2 Col3 Col5_A Col5_B Col4_A Col4_B
# 1:    1    1     90    130     10     50
# 2:    1    2    100    140     20     60
# 3:    1    3    110    150     30     70
# 4:    1    4    120    160     40     80

Data-2:

df2 <- read.table(text='Col1     Col2    Col3   Col4    Col5
   A        1       1     10      90
                  A        1       2     20     100 
                  A        1       3     30     110 
                  A        1       4     40     120
                  B        2       1     50     130
                  B        2       2     60     140 
                  B        2       3     70     150
                  B        2       4     80     160', header = TRUE)
setDT(df2)
value.var <- names(df2)[!names(df2) %in% c('Col1', 'Col2', 'Col3')]
dcast(df2, Col2 + Col3 ~ Col1, value.var = value.var )

#    Col2 Col3 Col5_A Col5_B Col4_A Col4_B
# 1:    1    1     90     NA     10     NA
# 2:    1    2    100     NA     20     NA
# 3:    1    3    110     NA     30     NA
# 4:    1    4    120     NA     40     NA
# 5:    2    1     NA    130     NA     50
# 6:    2    2     NA    140     NA     60
# 7:    2    3     NA    150     NA     70
# 8:    2    4     NA    160     NA     80

Upvotes: 2

acylam
acylam

Reputation: 18701

We can use gather, unite and spread from tidyr:

library(dplyr)
library(tidyr)

df %>%
  gather(var, value, -(Col1:Col3)) %>%
  unite(var, var, Col1, sep="_") %>%
  spread(var, value)

Result:

  Col2 Col3 Col4_A Col4_B Col5_A Col5_B
1    1    1     10     NA     90     NA
2    1    2     20     NA    100     NA
3    1    3     30     NA    110     NA
4    1    4     40     NA    120     NA
5    2    1     NA     50     NA    130
6    2    2     NA     60     NA    140
7    2    3     NA     70     NA    150
8    2    4     NA     80     NA    160

Upvotes: 4

Related Questions