Reputation: 51
I have a large dataset with more than 600 columns. an extract looks like this:
category q151_a34 q151_a35 q151_a36 q151_a37 q152_a34 q152_a35 q152_a36 q152_a37
1 men 2 12 37 24 0 7 19 23
2 women 3 10 23 19 2 4 29 16
Now I want to convert/transpose it to the following structure:
category var a34 a35 a36 a37
1 men q151 2 12 37 24
2 men q152 0 7 19 23
3 women q151 3 10 23 19
4 women q152 2 4 29 16
I found a two step solution: step 1: spliting variables
get.first<- function(x) sapply(strsplit(x, "\\_"), `[[`, 1)
data<- within(data, var<- get.first(as.character(variable)))
get.second <- function(x) sapply(strsplit(x, "\\_"), `[[`, 2)
data<- within(data, answer<- get.first(as.character(variable)))
step 2: transposing
freqTest <- dcast(data, category + var~ idAnswer, value.var = "value")
Is there a nicer/more performant way?
Many thanks
Upvotes: 1
Views: 1191
Reputation: 13274
You can use tidyr
from the tidyverse
group of packages for this:
# install.packages('tidyverse')
library(tidyverse)
# Data
dff <- structure(list(category = structure(1:2, .Label = c("men", "women" ), class = "factor"), q151_a34 = 2:3, q151_a35 = c(12L, 10L),
q151_a36 = c(37L, 23L), q151_a37 = c(24L, 19L), q152_a34 = c(0L,
2L), q152_a35 = c(7L, 4L), q152_a36 = c(19L, 29L), q152_a37 = c(23L,
16L)), .Names = c("category", "q151_a34", "q151_a35", "q151_a36", "q151_a37", "q152_a34", "q152_a35", "q152_a36", "q152_a37"), class = "data.frame", row.names = c(NA,
-2L))
# Code
gather(dff, key=Qs, value = values, q151_a34:q152_a37) %>%
separate(Qs, into = c('var','A')) %>%
spread(A, values)
# Output
category var a34 a35 a36 a37
1 men q151 2 12 37 24
2 men q152 0 7 19 23
3 women q151 3 10 23 19
4 women q152 2 4 29 16
Essentially, you first need to move the column names into rows with the help of the gather
function, and then use the separate
function to separate the values by the underscore. This will put both values starting with q
in one column while putting those values starting with a
in another column. From here, the next step is to put back the values starting with a
as your column names, with the help of the spread
function. Combined, these steps should get to your desired output.
If you don't want to deal with getting the names of the columns you are looking to move into rows, then you can try using the following:
gather(dff, key=Qs, value = values, -category) %>%
separate(Qs, into = c('var','A')) %>%
spread(A, values)
This is exactly the same as the previous solution, with the only difference being that you are only concerning yourself with the column you wish to have remain a column. Essentially, this brings down everything but the column name category
.
I hope this helps.
Upvotes: 1