MonkeyCousin
MonkeyCousin

Reputation: 189

add rows and values to data frame based on unmatched entries R

My data resembles:

dataset <- c(rep("K19", 4), rep("K20", 5))
feature <- c(letters[1:4], "a", "b", "c", "e", "f")
gain <- (c(0.4, 0.3, 0.2, 0.1, 0.35, 0.3, 0.18, 0.05, 0.02))
mydata <- data.frame(dataset, feature, gain)

K19 / K20 represent years that share some features but differ in others. To make a barplot like the one over here, I need to fill in the missing features so that each year has all features "a", "b", "c", "d", "e", "f". The 'gain' value for the previously missing features should be zero, like this:

dataset <- c(rep("K19", 6), rep("K20", 6))
feature <- c(letters[1:6], letters[1:6])
gain <- (c(0.4, 0.3, 0.2, 0.1, 0.0, 0.0, 0.35, 0.3, 0.18, 0.0, 0.05, 0.02))
mydata <- data.frame(dataset, feature, gain)

Upvotes: 1

Views: 29

Answers (1)

Peter
Peter

Reputation: 12729

Using tidyr::complete()



library(tidyr)


complete(mydata, dataset, feature, fill = list(gain = 0))

#> # A tibble: 12 x 3
#>    dataset feature  gain
#>    <chr>   <chr>   <dbl>
#>  1 K19     a        0.4 
#>  2 K19     b        0.3 
#>  3 K19     c        0.2 
#>  4 K19     d        0.1 
#>  5 K19     e        0   
#>  6 K19     f        0   
#>  7 K20     a        0.35
#>  8 K20     b        0.3 
#>  9 K20     c        0.18
#> 10 K20     d        0   
#> 11 K20     e        0.05
#> 12 K20     f        0.02

Created on 2021-04-25 by the reprex package (v2.0.0)

data

dataset <- c(rep("K19", 4), rep("K20", 5))
feature <- c(letters[1:4], "a", "b", "c", "e", "f")
gain <- (c(0.4, 0.3, 0.2, 0.1, 0.35, 0.3, 0.18, 0.05, 0.02))
mydata <- data.frame(dataset, feature, gain)

Upvotes: 1

Related Questions