Reputation: 693
I'm struggling on how can I keep on my dataframe only the elements that the difference of values in measure
are minimal 2
for each group in id1
. Here is a toy example:
test = data.frame(measure = c(1, 2, 3, 2, 2, 4, 0, 1),
id1 = c("A", "B", "A", "B", "C", "C", "A", "D"),
id2 = c("16", "16", "16", "16", "17", "17", "17", "18"))
# > test
# measure id1 id2
# 1 1 A 16
# 2 2 B 16
# 3 3 A 16
# 4 2 B 16
# 5 2 C 17
# 6 4 C 17
# 7 0 A 17
# 8 1 D 18
#as result, I'd like something like
#> res
# measure id1 id2
# 1 1 A 16
# 3 3 A 16
# 5 2 C 17
# 6 4 C 17
# 7 0 A 17
The id1
equal A
and D
where removed, since their max(value) - min(value) < 2
.
Any hint on how can I apply this?
Upvotes: 1
Views: 65
Reputation: 4169
The data.table structure provides an option here.
# Load library
library(data.table)
# Create data.table
test <-
data.table(measure = c(1, 2, 3, 2, 2, 4, 0, 1),
id1 = c("A", "B", "A", "B", "C", "C", "A", "D"),
id2 = c("16", "16", "16", "16", "17", "17", "17", "18"))
# Solution:
test[id1 %in% test[, .(max(measure)-min(measure)), by=id1][V1>=2, id1]]
In this solution, test[, .(max(measure)-min(measure)), by=id1]
is giving a data.table of differences and [V1>=2, id1]
is subsetting that and returning a vector of valid id1 values. The id1 %in%
then subsets the original data on the contents of that vector.
The result is:
measure id1 id2
1: 1 A 16
2: 3 A 16
3: 2 C 17
4: 4 C 17
5: 0 A 17
A mini data.table lesson:
With data.tables, we use the notation DT[i, j, by]
where:
DT
is the data.tablei
is the operations on observations (e.g. filter, sort)j
is the operations on variables (e.g. select, process)by
(and keyby
) provides grouping for the j
-operationsdata.tables have some special characters including
.N
for "number of observations".SD
for "selected variables"Install and load the data.table package to use this structure.
Convert an existing data.frame, list or tibble into a data.table using setDT()
or data.table()
Use fread()
to read external data files directly into a data.table.
An example - To get the number of cars and mean fuel efficiency of cars from mtcars, where hp>100 and summary stats grouped by number of cylinders:
dtcars <- data.table(mtcars)
dtcars[hp>100, .("n" = .N, "mean_mpg" = mean(mpg)), keyby=cyl]
Upvotes: 1
Reputation: 2924
Another tidyverse
option.
test %>%
group_by(id1) %>%
filter((max(measure)- min(measure)) >= 2)
Upvotes: 2
Reputation: 25323
A possible solution:
library(tidyverse)
test = data.frame(measure = c(1, 2, 3, 2, 2, 4, 0, 1),
id1 = c("A", "B", "A", "B", "C", "C", "A", "D"),
id2 = c("16", "16", "16", "16", "17", "17", "17", "18"))
test %>%
group_by(id1) %>%
mutate(aux = (max(c_across(measure))- min(c_across(measure))) >= 2) %>%
ungroup %>% filter(aux) %>% select(-aux)
#> # A tibble: 5 × 3
#> measure id1 id2
#> <dbl> <chr> <chr>
#> 1 1 A 16
#> 2 3 A 16
#> 3 2 C 17
#> 4 4 C 17
#> 5 0 A 17
Upvotes: 1