Reputation: 41
I would like to subset values from one continuous column based on the conditions of two categorical columns.
> df.example <- data.frame(V1 = c("A", "A", "A", "B", "B", "B", "A", "A", "B", "B"),
+ V2 = c("B", "A", "B", "A", "B", "A", "B", "A", "B", "A"),
+ V3 = c("5", "3", "1", "7", "11", "2", "2", "11", "1", "3"))
> df.example
V1 V2 V3
1 A B 5
2 A A 3
3 A B 1
4 B A 7
5 B B 11
6 B A 2
7 A B 2
8 A A 11
9 B B 1
10 B A 3
From df.example I would like to retrieve those rows where V3 is max for each unique combination of V1=V2; here
5 B B 11
8 A A 11
and the minimum for each unique combination when V1=/V2; here
3 A B 1
10 B A 3
This is simplified of course, there are 100's of categories in V1 and V2 in my data frames.
Upvotes: 3
Views: 154
Reputation: 14774
Another option:
library(dplyr)
df.example %>%
group_by(V1, V2) %>%
filter(V3 == eval(parse(text = paste0(c('max', 'min')[(V1 != V2) + 1], '(V3)'))))
Output:
# A tibble: 4 x 3
# Groups: V1, V2 [4]
V1 V2 V3
<chr> <chr> <int>
1 A B 1
2 B B 11
3 B A 2
4 A A 11
Or similarly to @IanCampbell in dplyr
:
library(dplyr)
df.example %>%
group_by(V1, V2) %>%
filter(if (V1 != V2) V3 == min(V3) else V3 == max(V3))
Just add mutate(V3 = as.integer(as.character(V3))) %>%
as second step if your V3
is really not integer
.
Upvotes: 1
Reputation: 24868
Here is a data.table approach.
library(data.table)
setDT(df.example)
#In supplied data, V3 is a character vector
df.example[,V3 := as.numeric(as.matrix(V3))]
df.example[,lapply(.SD,function(x){
if(V1==V2)
{max(V3)}
else
{min(V3)}})
,by=list(V1,V2)]
V1 V2 V3
1: A B 1
2: A A 11
3: B A 2
4: B B 11
Upvotes: 2
Reputation: 40171
One possibility utilizing dplyr
could be:
df.example %>%
filter(V1 == V2) %>%
group_by(V1, V2) %>%
summarise_all(max) %>%
bind_rows(df.example %>%
filter(V1 != V2) %>%
group_by(V1, V2) %>%
summarise_all(min))
V1 V2 V3
<fct> <fct> <dbl>
1 A A 11
2 B B 11
3 A B 1
4 B A 2
Upvotes: 0