Reputation: 189
My data resembles:
dataset <- c(rep("K19", 4), rep("K20", 5))
feature <- c(letters[1:4], "a", "b", "c", "e", "f")
gain <- (c(0.4, 0.3, 0.2, 0.1, 0.35, 0.3, 0.18, 0.05, 0.02))
mydata <- data.frame(dataset, feature, gain)
K19 / K20 represent years that share some features but differ in others. To make a barplot like the one over here, I need to fill in the missing features so that each year has all features "a", "b", "c", "d", "e", "f"
. The 'gain' value for the previously missing features should be zero, like this:
dataset <- c(rep("K19", 6), rep("K20", 6))
feature <- c(letters[1:6], letters[1:6])
gain <- (c(0.4, 0.3, 0.2, 0.1, 0.0, 0.0, 0.35, 0.3, 0.18, 0.0, 0.05, 0.02))
mydata <- data.frame(dataset, feature, gain)
Upvotes: 1
Views: 29
Reputation: 12729
Using tidyr::complete()
library(tidyr)
complete(mydata, dataset, feature, fill = list(gain = 0))
#> # A tibble: 12 x 3
#> dataset feature gain
#> <chr> <chr> <dbl>
#> 1 K19 a 0.4
#> 2 K19 b 0.3
#> 3 K19 c 0.2
#> 4 K19 d 0.1
#> 5 K19 e 0
#> 6 K19 f 0
#> 7 K20 a 0.35
#> 8 K20 b 0.3
#> 9 K20 c 0.18
#> 10 K20 d 0
#> 11 K20 e 0.05
#> 12 K20 f 0.02
Created on 2021-04-25 by the reprex package (v2.0.0)
data
dataset <- c(rep("K19", 4), rep("K20", 5))
feature <- c(letters[1:4], "a", "b", "c", "e", "f")
gain <- (c(0.4, 0.3, 0.2, 0.1, 0.35, 0.3, 0.18, 0.05, 0.02))
mydata <- data.frame(dataset, feature, gain)
Upvotes: 1