Reputation: 41

Subset on conditions of two categorical columns in r

I would like to subset values from one continuous column based on the conditions of two categorical columns.

> df.example <- data.frame(V1 = c("A", "A", "A", "B", "B", "B", "A", "A", "B", "B"),
+                V2 = c("B",  "A",  "B", "A", "B",  "A", "B",  "A", "B",  "A"),
+                     V3 = c("5",  "3",  "1", "7", "11", "2", "2", "11", "1", "3"))  
> df.example
   V1 V2 V3
1   A  B  5
2   A  A  3
3   A  B  1
4   B  A  7
5   B  B 11
6   B  A  2
7   A  B  2
8   A  A 11
9   B  B  1
10  B  A  3

From df.example I would like to retrieve those rows where V3 is max for each unique combination of V1=V2; here

5   B  B 11
8   A  A 11

and the minimum for each unique combination when V1=/V2; here

3    A  B  1
10   B  A  3

This is simplified of course, there are 100's of categories in V1 and V2 in my data frames.

Upvotes: 3

Answers (3)

arg0naut91

Reputation: 14774

Another option:

library(dplyr)

df.example %>%
  group_by(V1, V2) %>%
  filter(V3 == eval(parse(text = paste0(c('max', 'min')[(V1 != V2) + 1], '(V3)'))))

Output:

# A tibble: 4 x 3
# Groups:   V1, V2 [4]
  V1    V2       V3
  <chr> <chr> <int>
1 A     B         1
2 B     B        11
3 B     A         2
4 A     A        11

Or similarly to @IanCampbell in dplyr:

library(dplyr)

df.example %>%
  group_by(V1, V2) %>%
  filter(if (V1 != V2) V3 == min(V3) else V3 == max(V3))

Just add mutate(V3 = as.integer(as.character(V3))) %>% as second step if your V3 is really not integer.

Upvotes: 1

Ian Campbell

Reputation: 24868

Here is a data.table approach.

library(data.table)
setDT(df.example)
#In supplied data, V3 is a character vector
df.example[,V3 := as.numeric(as.matrix(V3))]
df.example[,lapply(.SD,function(x){
  if(V1==V2)
    {max(V3)}
  else
    {min(V3)}})
  ,by=list(V1,V2)]
   V1 V2 V3
1:  A  B  1
2:  A  A 11
3:  B  A  2
4:  B  B 11

Upvotes: 2

tmfmnk

Reputation: 40171

One possibility utilizing dplyr could be:

df.example %>%
 filter(V1 == V2) %>%
 group_by(V1, V2) %>%
 summarise_all(max) %>%
 bind_rows(df.example %>%
            filter(V1 != V2) %>%
            group_by(V1, V2) %>%
            summarise_all(min))

  V1    V2       V3
  <fct> <fct> <dbl>
1 A     A        11
2 B     B        11
3 A     B         1
4 B     A         2

Upvotes: 0

Subset on conditions of two categorical columns in r

Answers (3)

Related Questions