Reputation: 3432
Hello everyone I ould need help in order to remove duplicate rows from a df only when a column is higher than a threshold.
Here is a dataframe :
Group Species Values
1 G1 Cattus_cattus 10
2 G1 Cattus_cattus 10
3 G1 Cattus_cattus 10
4 G2 Canis_lupus 2
5 G2 Canis_lupus 2
6 G3 Griseus_lupa 90
7 G4 Griseus_lupa 89
I would liek to remove duplicated c(Group,Species)
when Values>5
Here I should then get :
Group Species Values
1 G1 Cattus_cattus 10
4 G2 Canis_lupus 2
5 G2 Canis_lupus 2
6 G3 Griseus_lupa 90
7 G4 Griseus_lupa 89
the data
structure(list(Group = structure(c(1L, 1L, 1L, 2L, 2L, 3L, 4L
), .Label = c("G1", "G2", "G3", "G4"), class = "factor"), Species = structure(c(2L,
2L, 2L, 1L, 1L, 3L, 3L), .Label = c("Canis_lupus", "Cattus_cattus",
"Griseus_lupa"), class = "factor"), Values = c(10L, 10L, 10L,
2L, 2L, 90L, 89L)), class = "data.frame", row.names = c(NA, -7L
))
Upvotes: 4
Views: 249
Reputation: 78927
library(dplyr)
df %>%
group_by(Group, Species) %>%
slice(if(any(Values > 5)) 1 else 1:n())
# output:
# Groups: Group, Species [4]
Group Species Values
<fct> <fct> <int>
1 G1 Cattus_cattus 10
2 G2 Canis_lupus 2
3 G2 Canis_lupus 2
4 G3 Griseus_lupa 90
5 G4 Griseus_lupa 89
Upvotes: 2
Reputation: 887048
Using dplyr
library(dplyr)
x %>%
filter(!duplicated(x)| Values <=5)
Upvotes: 2
Reputation: 39657
You can use duplicated
and combine it with an or |
testing for x$Values < 5
.
x[!duplicated(x) | x$Values <= 5,]
#x[!(duplicated(x) & x$Values > 5),] #Alternative
# Group Species Values
#1 G1 Cattus_cattus 10
#4 G2 Canis_lupus 2
#5 G2 Canis_lupus 2
#6 G3 Griseus_lupa 90
#7 G4 Griseus_lupa 89
Or only for Group and Species:
x[!(duplicated(x[c("Group","Species")]) & x$Values > 5),]
Upvotes: 3