Ginger Geiger
Ginger Geiger

Reputation: 1

R merge data /expand data set

I'm trying to expand my data set using R. I have recorded the observations for each sample and calculated percentages based on those observations. I need now to expand each sample to list each possible observation without doing any calculations. Example of myData: Starting data set:

Sample    Observation    Percent
A         Y              50
A         N              50
B         Y              10
B         N              80
B         Don't know     10 

Desired data set:

Sample    Observation    Percent
A         Y              50
A         N              50
A         Don't know     NA
B         Y              10
B         N              80
B         Don't know     10 

So in this case, I would need to expand all of sample A to include the "Don't know" category and fill that in with "NA".

I have tried

myTable <- table(myData)
TableFrame2 <- data.frame(myTable)

Which expands the data set but messes up the Percentage column (?why). I thought I could merge the percentages back, but I need to match that column to the expanded set by both the sample and Observation columns to get an exact match. Any suggestions?

Upvotes: 0

Views: 215

Answers (1)

r2evans
r2evans

Reputation: 160437

One way is to merge/join the combinations back into the data. (I altered the data slightly to make it easy to copy/paste here in SO.)

dat <- read.table(header=TRUE, stringsAsFactors=FALSE, text='
Sample    Observation    Percent
A         Y              50
A         N              50
B         Y              10
B         N              80
B         Don_t_know     10 ')

Base R

merge(
  dat,
  expand.grid(Sample = unique(dat$Sample),
              Observation = unique(dat$Observation),
              stringsAsFactors = FALSE),
  by = c("Sample", "Observation"),
  all = TRUE
)
#   Sample Observation Percent
# 1      A  Don_t_know      NA
# 2      A           N      50
# 3      A           Y      50
# 4      B  Don_t_know      10
# 5      B           N      80
# 6      B           Y      10

Tidyverse:

library(dplyr)
library(tidyr)

dat %>%
  full_join(
    crossing(Sample = unique(dat$Sample), Observation = unique(dat$Observation)),
    by = c("Sample", "Observation")
  )
#   Sample Observation Percent
# 1      A           Y      50
# 2      A           N      50
# 3      B           Y      10
# 4      B           N      80
# 5      B  Don_t_know      10
# 6      A  Don_t_know      NA

or even

dat %>%
  full_join(expand(., Sample, Observation))
# Joining, by = c("Sample", "Observation")
#   Sample Observation Percent
# 1      A           Y      50
# 2      A           N      50
# 3      B           Y      10
# 4      B           N      80
# 5      B  Don_t_know      10
# 6      A  Don_t_know      NA

Upvotes: 1

Related Questions