Cristian
Cristian

Reputation: 179

Move data from rows into columns in R, ignoring missing values and adding column based on the transposed columns

I have a dataset of an experiment where participants were assigned to one of four treatments. The way the data is organized in the dataset, there is a separate set of variables for each treatment, so for each row one set of columns (for the treatment they saw) has valid values and the remaining three sets of columns (for the three treatments they did not see) are all empty.

I would like to rearrange the dataset so that:

  1. One new column designates which treatment each participant was assigned to (i.e. which set of columns has valid values for that row)

  2. A new set of columns reports the valid values for the columns of the treatment the participant was assigned to

Currently, my dataset looks like the one generated by this code:

T1a <- c(1, NA, NA, NA)
T1b <- c(2, NA, NA, NA)
T2a <- c(NA, NA, 3, NA)
T2b <- c(NA, NA, 4, NA)
T3a <- c(NA, 0, NA, NA)
T3b <- c(NA, 5, NA, NA)
T4a <- c(NA, NA, NA, 4)
T4b <- c(NA, NA, NA, 2)
data <- rbind(data.frame(T1a, T1b, T2a, T2b, T3a, T3b, T4a, T4b))

> data
  T1a T1b T2a T2b T3a T3b T4a T4b
1   1   2  NA  NA  NA  NA  NA  NA
2  NA  NA  NA  NA   0   5  NA  NA
3  NA  NA   3   4  NA  NA  NA  NA
4  NA  NA  NA  NA  NA  NA   4   2

I would like to rearrange the dataset as follows:

> data2
   Tr   a   b
1   1   1   2
2   3   0   5
3   2   3   4
4   4   4   2 

I know I could do this manually by creating new columns and assigning them the respective values of the columns without missing data in each row, but I was wondering if there is a faster and more elegant way to do it.

Many thanks in advance for your help!

Upvotes: 2

Views: 174

Answers (2)

DeduciveR
DeduciveR

Reputation: 1702

It's kind-of deprecated, but still useful; reshape2 - you can melt and then dcast. I find them a little easier to use than their tidyverse siblings:

library(reshape2)
library(tidyverse)

result <- data %>%
  melt(value.name = "col") %>% 
  drop_na() %>%
  mutate(letter   = substr(variable, 3, 3),
         variable = substr(variable, 2, 2)) %>%
  dcast(variable~letter, value.var = "col") %>% 
  rename(Tr = variable)

print(result)
   Tr a b
 1  1 1 2
 2  2 3 4
 3  3 0 5
 4  4 4 2

Upvotes: 0

www
www

Reputation: 39164

A solution using dplyr and tidyr.

library(dplyr)
library(tidyr)

data2 <- data %>%
  gather(Column, Value) %>%
  drop_na(Value) %>%
  extract(Column, into = c("Letter", "Tr", "Subject"), regex = "(^T)([0-9]+)([a-z])+$") %>%
  spread(Subject, Value) %>%
  select(-Letter) %>%
  mutate(Tr = as.integer(Tr))
data2
#   Tr a b
# 1  1 1 2
# 2  2 3 4
# 3  3 0 5
# 4  4 4 2

Upvotes: 3

Related Questions