user12933512
user12933512

Reputation: 91

How to select among 3 values, the 2 closest to each other in R?

I would like to select for each ID the two closest values of Cq. I thought I'd figured it out, but it depends on row position...

Here is an example of the form of my dataset :

df <- data.frame(ID = c("A","A","A","B","B","B","C","C","C"), 
                 Cq = c(34.32,34.40,34.31,31.49,31.40,31.49,31.22,31.31,31.08))
  ID    Cq
1  A 34.32
2  A 34.40
3  A 34.31
4  B 31.49
5  B 31.40
6  B 31.49
7  C 31.22
8  C 31.31
9  C 31.08

And what I tried

df4 <-df %>% 
  group_by(ID) %>% 
  arrange(Cq) %>% 
  mutate(diffvals= Cq - lag(Cq)) %>%
  filter(row_number() == 1 | row_number() == 2)

#Output
ID       Cq   diffvals
1 A      34.31   NA     
2 A      34.32   0.0100
3 B      31.40   NA     
4 B      31.49   0.0900
5 C      31.08   NA     
6 C      31.22   0.14 

And the expected Output

 ID    Cq
1  A 34.32
2  A 34.31
3  B 31.49
4  B 31.49
5  C 31.22
6  C 31.31

I've tried sorting my dataset before, but it doesn't change anything. I also tried using filter(diffvals=wich.min==diffvals)but I don't know how to extract the two smallest.

If you have any ideas, it would help me a lot!

Thanks in advance

Upvotes: 9

Views: 270

Answers (5)

Allan Cameron
Allan Cameron

Reputation: 173793

In base R

do.call(rbind, lapply(split(df, df$ID), function(x){ 
  cell <- order(abs(outer(x$Cq, x$Cq, `-`)))[-seq(nrow(x))][1] - 1;
  x[c((cell %/% nrow(x)) + 1, (cell %% nrow(x)) + 1),]}))
#>     ID    Cq
#> A.1  A 34.32
#> A.3  A 34.31
#> B.4  B 31.49
#> B.6  B 31.49
#> C.7  C 31.22
#> C.8  C 31.31

Upvotes: 1

ThomasIsCoding
ThomasIsCoding

Reputation: 101189

Here is a base R code, where dist is used to enumerate distances of all pairs within groups, i.e.,

dfout <- do.call(rbind,
                 lapply(split(df,df$ID), 
                        function(v) {
                          d <- `diag<-`(as.matrix(dist(v$Cq)),NA)
                          d[lower.tri(d)] <- NA
                          v[which(d==min(d,na.rm = T),arr.ind = T),]
                        }
                 ))

such that

> dfout
    ID    Cq
A.1  A 34.32
A.3  A 34.31
B.4  B 31.49
B.6  B 31.49
C.7  C 31.22
C.8  C 31.31

Upvotes: 4

user2974951
user2974951

Reputation: 10375

Different output but functionally equivalent

do.call(rbind,
  by(df,list(df$ID),function(x){
    tmp=abs(outer(x$Cq,x$Cq,"-"))
    tmp[upper.tri(tmp,diag=T)]=Inf
    x$Cq[which(tmp==min(tmp),arr.ind=T)]
  })
)

   [,1]  [,2]
A 34.31 34.32
B 31.49 31.49
C 31.31 31.22

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 388862

Using dplyr one option is to do a full_join with itself based on ID. Remove the rows which are generated in combination with itself and for each ID select the row with minimum difference and get the data in long format.

library(dplyr)

df %>%
  mutate(Row = row_number()) %>%
  full_join(df, by = 'ID') %>%
  group_by(ID, Row) %>%
  filter(Cq.x != Cq.y) %>%
  group_by(ID) %>%
  slice(which.min(abs(Cq.x - Cq.y))) %>%
  tidyr::pivot_longer(cols  = starts_with('Cq')) %>%
  select(-Row, -name)

#  ID    value
#  <fct> <dbl>
#1 A      34.3
#2 A      34.3
#3 B      31.5
#4 B      31.4
#5 C      31.2
#6 C      31.3

Upvotes: 3

Pawel Stradowski
Pawel Stradowski

Reputation: 827

Try this:

library(tidyverse)
df <- data.frame(ID = c("A","A","A","B","B","B","C","C","C"), 
                 Cq = c(34.32,34.40,34.31,31.49,31.40,31.49,31.22,31.31,31.08))

df_summ <- 
  df %>% 
  group_by(ID) %>% 
  arrange(Cq) %>% 
  mutate(
    prev = lag(Cq),
    diff= Cq - lag(Cq)) %>% 
  drop_na()
df_summ %>% 
  group_by(ID) %>% 
  summarise(diff = min(diff)) %>% 
  left_join(df_summ) %>% 
  select(-diff) %>% 
  pivot_longer(c(Cq, prev), values_to = "cq") %>% 
  select(-name)

Regards Paweł

Upvotes: 1

Related Questions