Tunn
Tunn

Reputation: 1536

Minimum value matching across values in multiple columns

I would like to return a dataframe with the minimum value of column one based on the values of columns 2-4:

df <- data.frame(one = rnorm(1000),
                 two = sample(letters, 1000, replace = T),
                 three = sample(letters, 1000, replace = T),
                 four = sample(letters, 1000, replace = T))

I can do:

df_group <- df %>%
  group_by(two) %>%
  filter(one = min(one))

This gets me the lowest value of all the "m's" in column two, but what if column three or four had a lower "m" value in column one?

The output should look like this:

            one   two
1  -0.311609752   r
2   0.053166742   n
3   1.546485810   a
4  -0.430308725   d
5  -0.145428664   c
6   0.419181639   u
7   0.008881661   i
8   1.223517580   t
9   0.797273157   b
10  0.790565358   v
11 -0.560031797   e
12 -1.546234090   q
13 -1.847945540   l
14 -1.489130228   z
15 -1.203255034   g
16  0.146969892   m
17 -0.552363433   f
18 -0.006234646   w
19  0.982932856   s
20  0.751936728   o
21  0.220751258   h
22 -1.557436228   y
23 -2.034885868   k
24 -0.463354387   j
25 -0.351448850   p
26  1.331365941   x

I don't care which column has the lowest value for a given letter, I just need the lowest value and the letter column.

I'm trying to wrap my head around writing this simplistically. This might be a duplicate, but I didn't know how to word the title and couldn't find any material or previous questions on how to do it.

Upvotes: 1

Views: 251

Answers (2)

agstudy
agstudy

Reputation: 121568

Another solution based in data.table :

library(data.table)
setDT(df)
melt(df,
     measure=grep("one",names(df),invert = TRUE,value=TRUE))[
  ,min(one),value]

Upvotes: 2

akuiper
akuiper

Reputation: 214957

You can do something like this:

library(dplyr); library(tidyr)

df %>% gather(cols, letts, -one) %>%        # gather all letters into one column
       group_by(letts) %>% 
       summarise(one = min(one))            # do a group by summary for each letter

# A tibble: 26 × 2
#   letts       one
#   <chr>     <dbl>
#1      a -2.092327
#2      b -2.461102
#3      c -3.055858
#4      d -2.092327
#5      e -2.461102
#6      f -2.249439
#7      g -1.941632
#8      h -2.543310
#9      i -3.055858
#10     j -1.896974
# ... with 16 more rows

Upvotes: 1

Related Questions