Reputation: 617
I have a data frame with following structure:
bad_df <- data.frame(
id = c("id001", "id002", "id003"),
participant.1 = c("Jana", "Marina", "Vasilei"),
participant.2 = c("Niko", "Micha", "Niko"),
role.1 = c("writer", "writer", "speaker"),
role.2 = c("observer", "observer", "observer"),
stringsAsFactors = F
)
bad_df
I would need to gather it into something like this. Each row should contain one id, participant and role.
good_df <- data.frame(
id = c("id001", "id001", "id002", "id002", "id003", "id003"),
participant = c("Jana", "Niko", "Marina", "Micha", "Vasilei", "Niko"),
role = c("writer", "observer", "writer", "observer", "speaker", "observer"),
stringsAsFactors = F
)
good_df
I see there are countless questions very much like this, but I find it very difficult to understand how to apply tidyr
or reshape2
into this situation. I understand this must be somehow possible with gather().
However, the data frame could contain larger number of participants and corresponding roles, so ideally the method would not demand specifying the column names. One solution which I came up with is below, but I don't think it is the most elegant way to go. And I still would need to deal with some data frames containing participant.3, role.3 etc.
good_df2 <- rbind(bad_df %>% select(id, participant.1, role.1) %>%
rename(participant = participant.1, role = role.1),
bad_df %>% select(id, participant.2, role.2) %>%
rename(participant = participant.2, role = role.2))
good_df2
Thank you!
Upvotes: 3
Views: 80
Reputation: 887118
You could try the devel version of data.table
ie. v1.9.5
. Instructions to install are here
library(data.table)
melt(setDT(bad_df), measure=list(grep('participant', names(bad_df)),
grep('role', names(bad_df))))[order(id)][, variable:= NULL]
# id value1 value2
#1: id001 Jana writer
#2: id001 Niko observer
#3: id002 Marina writer
#4: id002 Micha observer
#5: id003 Vasilei speaker
#6: id003 Niko observer
Or we can use merged.stack
where we need to only provide the prefix of the unique columns. Based on the prefix values, it will group the columns that have the same prefix together.
library(splitstackshape)
merged.stack(bad_df, var.stubs=c('participant', 'role'),
sep='var.stubs')[, 2:= NULL]
# id participant role
#1: id001 Jana writer
#2: id001 Niko observer
#3: id002 Marina writer
#4: id002 Micha observer
#5: id003 Vasilei speaker
#6: id003 Niko observer
Or using dplyr/tidyr
library(dplyr)
library(tidyr)
gather(bad_df, Var, Val, -id) %>%
separate(Var, into=c('Var1', 'Var2')) %>%
spread(Var1, Val) %>%
select(-Var2)
# id participant role
#1 id001 Jana writer
#2 id001 Niko observer
#3 id002 Marina writer
#4 id002 Micha observer
#5 id003 Vasilei speaker
#6 id003 Niko observer
Upvotes: 4
Reputation: 24480
I'd go this way in base
R:
#find the participant columns
partCol<-grep("part",colnames(bad_df))
#... and the role columns
roleCol<-grep("role",colnames(bad_df))
data.frame(id=rep(bad_df$id,each=length(partCol)),
partecipant=as.vector(as.matrix(t(bad_df[,partCol]))),
role=as.vector(as.matrix(t(bad_df[,roleCol]))))
Upvotes: 3