Reputation: 179
I have a dataset of an experiment where participants were assigned to one of four treatments. The way the data is organized in the dataset, there is a separate set of variables for each treatment, so for each row one set of columns (for the treatment they saw) has valid values and the remaining three sets of columns (for the three treatments they did not see) are all empty.
I would like to rearrange the dataset so that:
One new column designates which treatment each participant was assigned to (i.e. which set of columns has valid values for that row)
A new set of columns reports the valid values for the columns of the treatment the participant was assigned to
Currently, my dataset looks like the one generated by this code:
T1a <- c(1, NA, NA, NA)
T1b <- c(2, NA, NA, NA)
T2a <- c(NA, NA, 3, NA)
T2b <- c(NA, NA, 4, NA)
T3a <- c(NA, 0, NA, NA)
T3b <- c(NA, 5, NA, NA)
T4a <- c(NA, NA, NA, 4)
T4b <- c(NA, NA, NA, 2)
data <- rbind(data.frame(T1a, T1b, T2a, T2b, T3a, T3b, T4a, T4b))
> data
T1a T1b T2a T2b T3a T3b T4a T4b
1 1 2 NA NA NA NA NA NA
2 NA NA NA NA 0 5 NA NA
3 NA NA 3 4 NA NA NA NA
4 NA NA NA NA NA NA 4 2
I would like to rearrange the dataset as follows:
> data2
Tr a b
1 1 1 2
2 3 0 5
3 2 3 4
4 4 4 2
I know I could do this manually by creating new columns and assigning them the respective values of the columns without missing data in each row, but I was wondering if there is a faster and more elegant way to do it.
Many thanks in advance for your help!
Upvotes: 2
Views: 174
Reputation: 1702
It's kind-of deprecated, but still useful; reshape2
- you can melt
and then dcast
. I find them a little easier to use than their tidyverse
siblings:
library(reshape2)
library(tidyverse)
result <- data %>%
melt(value.name = "col") %>%
drop_na() %>%
mutate(letter = substr(variable, 3, 3),
variable = substr(variable, 2, 2)) %>%
dcast(variable~letter, value.var = "col") %>%
rename(Tr = variable)
print(result)
Tr a b
1 1 1 2
2 2 3 4
3 3 0 5
4 4 4 2
Upvotes: 0
Reputation: 39164
A solution using dplyr
and tidyr
.
library(dplyr)
library(tidyr)
data2 <- data %>%
gather(Column, Value) %>%
drop_na(Value) %>%
extract(Column, into = c("Letter", "Tr", "Subject"), regex = "(^T)([0-9]+)([a-z])+$") %>%
spread(Subject, Value) %>%
select(-Letter) %>%
mutate(Tr = as.integer(Tr))
data2
# Tr a b
# 1 1 1 2
# 2 2 3 4
# 3 3 0 5
# 4 4 4 2
Upvotes: 3