stda
stda

Reputation: 51

Transpose groups of columns to rows

I have a large dataset with more than 600 columns. an extract looks like this:

  category q151_a34 q151_a35 q151_a36 q151_a37 q152_a34 q152_a35 q152_a36 q152_a37
1 men      2         12      37       24       0        7        19       23
2 women    3         10      23       19       2        4        29       16

Now I want to convert/transpose it to the following structure:

  category  var     a34 a35 a36 a37
1 men       q151    2   12   37  24
2 men       q152    0   7    19  23
3 women     q151    3   10   23  19
4 women     q152    2   4    29  16

I found a two step solution: step 1: spliting variables

get.first<- function(x) sapply(strsplit(x, "\\_"), `[[`, 1)
data<- within(data, var<- get.first(as.character(variable)))
get.second <- function(x) sapply(strsplit(x, "\\_"), `[[`, 2)
data<- within(data, answer<- get.first(as.character(variable)))

step 2: transposing

freqTest <- dcast(data, category + var~ idAnswer, value.var = "value")

Is there a nicer/more performant way?

Many thanks

Upvotes: 1

Views: 1191

Answers (1)

Abdou
Abdou

Reputation: 13274

You can use tidyr from the tidyverse group of packages for this:

# install.packages('tidyverse')


library(tidyverse)

# Data

dff <- structure(list(category = structure(1:2, .Label = c("men", "women" ), class = "factor"), q151_a34 = 2:3, q151_a35 = c(12L, 10L), 
    q151_a36 = c(37L, 23L), q151_a37 = c(24L, 19L), q152_a34 = c(0L, 
    2L), q152_a35 = c(7L, 4L), q152_a36 = c(19L, 29L), q152_a37 = c(23L, 
    16L)), .Names = c("category", "q151_a34", "q151_a35", "q151_a36",  "q151_a37", "q152_a34", "q152_a35", "q152_a36", "q152_a37"), class = "data.frame", row.names = c(NA, 
-2L))

# Code

gather(dff, key=Qs, value = values, q151_a34:q152_a37) %>%
    separate(Qs, into = c('var','A')) %>%
    spread(A, values)


# Output

  category  var a34 a35 a36 a37
1      men q151   2  12  37  24
2      men q152   0   7  19  23
3    women q151   3  10  23  19
4    women q152   2   4  29  16

Essentially, you first need to move the column names into rows with the help of the gather function, and then use the separate function to separate the values by the underscore. This will put both values starting with q in one column while putting those values starting with a in another column. From here, the next step is to put back the values starting with a as your column names, with the help of the spread function. Combined, these steps should get to your desired output.

Edit:

If you don't want to deal with getting the names of the columns you are looking to move into rows, then you can try using the following:

gather(dff, key=Qs, value = values, -category) %>%
    separate(Qs, into = c('var','A')) %>%
    spread(A, values)

This is exactly the same as the previous solution, with the only difference being that you are only concerning yourself with the column you wish to have remain a column. Essentially, this brings down everything but the column name category.

I hope this helps.

Upvotes: 1

Related Questions