Reputation: 91
I would like to select for each ID
the two closest values of Cq
. I thought I'd figured it out, but it depends on row position...
Here is an example of the form of my dataset :
df <- data.frame(ID = c("A","A","A","B","B","B","C","C","C"),
Cq = c(34.32,34.40,34.31,31.49,31.40,31.49,31.22,31.31,31.08))
ID Cq
1 A 34.32
2 A 34.40
3 A 34.31
4 B 31.49
5 B 31.40
6 B 31.49
7 C 31.22
8 C 31.31
9 C 31.08
And what I tried
df4 <-df %>%
group_by(ID) %>%
arrange(Cq) %>%
mutate(diffvals= Cq - lag(Cq)) %>%
filter(row_number() == 1 | row_number() == 2)
#Output
ID Cq diffvals
1 A 34.31 NA
2 A 34.32 0.0100
3 B 31.40 NA
4 B 31.49 0.0900
5 C 31.08 NA
6 C 31.22 0.14
And the expected Output
ID Cq
1 A 34.32
2 A 34.31
3 B 31.49
4 B 31.49
5 C 31.22
6 C 31.31
I've tried sorting my dataset before, but it doesn't change anything. I also tried using filter(diffvals=wich.min==diffvals)
but I don't know how to extract the two smallest.
If you have any ideas, it would help me a lot!
Thanks in advance
Upvotes: 9
Views: 270
Reputation: 173793
In base R
do.call(rbind, lapply(split(df, df$ID), function(x){
cell <- order(abs(outer(x$Cq, x$Cq, `-`)))[-seq(nrow(x))][1] - 1;
x[c((cell %/% nrow(x)) + 1, (cell %% nrow(x)) + 1),]}))
#> ID Cq
#> A.1 A 34.32
#> A.3 A 34.31
#> B.4 B 31.49
#> B.6 B 31.49
#> C.7 C 31.22
#> C.8 C 31.31
Upvotes: 1
Reputation: 101189
Here is a base R code, where dist
is used to enumerate distances of all pairs within groups, i.e.,
dfout <- do.call(rbind,
lapply(split(df,df$ID),
function(v) {
d <- `diag<-`(as.matrix(dist(v$Cq)),NA)
d[lower.tri(d)] <- NA
v[which(d==min(d,na.rm = T),arr.ind = T),]
}
))
such that
> dfout
ID Cq
A.1 A 34.32
A.3 A 34.31
B.4 B 31.49
B.6 B 31.49
C.7 C 31.22
C.8 C 31.31
Upvotes: 4
Reputation: 10375
Different output but functionally equivalent
do.call(rbind,
by(df,list(df$ID),function(x){
tmp=abs(outer(x$Cq,x$Cq,"-"))
tmp[upper.tri(tmp,diag=T)]=Inf
x$Cq[which(tmp==min(tmp),arr.ind=T)]
})
)
[,1] [,2]
A 34.31 34.32
B 31.49 31.49
C 31.31 31.22
Upvotes: 0
Reputation: 388862
Using dplyr
one option is to do a full_join
with itself
based on ID
. Remove the rows which are generated in combination with itself and for each ID
select the row with minimum difference and get the data in long format.
library(dplyr)
df %>%
mutate(Row = row_number()) %>%
full_join(df, by = 'ID') %>%
group_by(ID, Row) %>%
filter(Cq.x != Cq.y) %>%
group_by(ID) %>%
slice(which.min(abs(Cq.x - Cq.y))) %>%
tidyr::pivot_longer(cols = starts_with('Cq')) %>%
select(-Row, -name)
# ID value
# <fct> <dbl>
#1 A 34.3
#2 A 34.3
#3 B 31.5
#4 B 31.4
#5 C 31.2
#6 C 31.3
Upvotes: 3
Reputation: 827
Try this:
library(tidyverse)
df <- data.frame(ID = c("A","A","A","B","B","B","C","C","C"),
Cq = c(34.32,34.40,34.31,31.49,31.40,31.49,31.22,31.31,31.08))
df_summ <-
df %>%
group_by(ID) %>%
arrange(Cq) %>%
mutate(
prev = lag(Cq),
diff= Cq - lag(Cq)) %>%
drop_na()
df_summ %>%
group_by(ID) %>%
summarise(diff = min(diff)) %>%
left_join(df_summ) %>%
select(-diff) %>%
pivot_longer(c(Cq, prev), values_to = "cq") %>%
select(-name)
Regards Paweł
Upvotes: 1