Reputation: 1
I'm trying to expand my data set using R. I have recorded the observations for each sample and calculated percentages based on those observations. I need now to expand each sample to list each possible observation without doing any calculations. Example of myData: Starting data set:
Sample Observation Percent
A Y 50
A N 50
B Y 10
B N 80
B Don't know 10
Desired data set:
Sample Observation Percent
A Y 50
A N 50
A Don't know NA
B Y 10
B N 80
B Don't know 10
So in this case, I would need to expand all of sample A to include the "Don't know" category and fill that in with "NA".
I have tried
myTable <- table(myData)
TableFrame2 <- data.frame(myTable)
Which expands the data set but messes up the Percentage column (?why). I thought I could merge the percentages back, but I need to match that column to the expanded set by both the sample and Observation columns to get an exact match. Any suggestions?
Upvotes: 0
Views: 215
Reputation: 160437
One way is to merge/join the combinations back into the data. (I altered the data slightly to make it easy to copy/paste here in SO.)
dat <- read.table(header=TRUE, stringsAsFactors=FALSE, text='
Sample Observation Percent
A Y 50
A N 50
B Y 10
B N 80
B Don_t_know 10 ')
Base R
merge(
dat,
expand.grid(Sample = unique(dat$Sample),
Observation = unique(dat$Observation),
stringsAsFactors = FALSE),
by = c("Sample", "Observation"),
all = TRUE
)
# Sample Observation Percent
# 1 A Don_t_know NA
# 2 A N 50
# 3 A Y 50
# 4 B Don_t_know 10
# 5 B N 80
# 6 B Y 10
Tidyverse:
library(dplyr)
library(tidyr)
dat %>%
full_join(
crossing(Sample = unique(dat$Sample), Observation = unique(dat$Observation)),
by = c("Sample", "Observation")
)
# Sample Observation Percent
# 1 A Y 50
# 2 A N 50
# 3 B Y 10
# 4 B N 80
# 5 B Don_t_know 10
# 6 A Don_t_know NA
or even
dat %>%
full_join(expand(., Sample, Observation))
# Joining, by = c("Sample", "Observation")
# Sample Observation Percent
# 1 A Y 50
# 2 A N 50
# 3 B Y 10
# 4 B N 80
# 5 B Don_t_know 10
# 6 A Don_t_know NA
Upvotes: 1