Reputation: 11
I would like to know how can i delete specific rows based on specific values in columns, but these deletions depend on the other variable in the subgroup. I would like to delete "aja" if it is subgrouped together with "ase". If the subgroup has both "ase" or "aja", script should leave it alone. I have indicated which ones should be deleted by the script.
id somedata subgroup
1 1 "aja" okay
2 1 "aja" okay
3 2 "ase" okay
4 2 "aja" delete
5 3 "aja" delete
6 3 "ase" okay
7 4 "aja" okay
8 4 "aja" okay
9 5 "ase" okay
10 5 "ase" okay
11 6 "aja" delete
12 6 "ase" okay
Code to generate the data
id = c(1,1,2,2,3,3,4,4,5,5,6,6)
somedata = c("aja","aja","ase","aja","aja","ase","aja","aja","ase","ase","aja","ase")
subgroup = c("okay","okay","okay","DELETE","DELETE","okay","okay","okay","okay","okay","DELETE","okay")
proov = data.frame(cbind(id,somedata,subgroup))
Upvotes: 0
Views: 50
Reputation: 246
Without the use of any additional packages, you can use this command:
proov = proov[!(proov$id %in% unique(proov[which(proov$somedata == "ase"), "id"]) & proov$somedata == "aja"),]
Upvotes: 0
Reputation: 388862
We can group by id
and remove rows where `somedata == "aja" and there is atleast one "ase"
library(dplyr)
proov %>% group_by(id) %>% filter(!(somedata == "aja" & any(somedata == "ase")))
# id somedata subgroup
# <fct> <fct> <fct>
#1 1 aja okay
#2 1 aja okay
#3 2 ase okay
#4 3 ase okay
#5 4 aja okay
#6 4 aja okay
#7 5 ase okay
#8 5 ase okay
#9 6 ase okay
which in base R can be written as
subset(proov, !as.logical(ave(as.character(somedata),
id, FUN = function(x) x == "aja" & any(x == "ase"))))
Upvotes: 2
Reputation: 51582
You can do a simple filtering, i.e.
library(dplyr)
proov %>%
group_by(id) %>%
filter(!(n_distinct(somedata) > 1 & somedata == 'aja'))
which gives,
# A tibble: 9 x 3 # Groups: id [6] id somedata subgroup <fct> <fct> <fct> 1 1 aja okay 2 1 aja okay 3 2 ase okay 4 3 ase okay 5 4 aja okay 6 4 aja okay 7 5 ase okay 8 5 ase okay 9 6 ase okay
Upvotes: 2