Reputation: 1582
This is what my dataframe looks like:
a <- c(1,1,4,4,5)
b <- c(1,2,3,3,5)
c <- c(1,4,4,4,5)
d <- c(2,2,4,4,5)
e <- c(1,5,3,2,5)
df <- data.frame(a,b,c,d,e)
I'd like to write something that returns all unique instances of vectors a,b,c,d that have a different value in vector e.
For example:
a b c d e
1 1 1 1 2 1
2 1 2 4 2 5
3 4 3 4 4 3
4 4 3 4 4 2
5 5 5 5 5 5
Rows 3 and 4 are exactly the same till vector d (having a combination of 4344) so only one instance of those should be returned, but they have 2 different values in vector e. I would want to get a count on those - so the combination of 4344 has 2 different values in vector e.
The expected output would tell me how many times a certain combination such as 4344 had different values in vector e. So in this case it would be something like:
a b c d e
4 3 4 4 2
So far I have something like this:
library(tidyr)
library(dplyr)
df %>%
unite(key_abcd, a, b, c, d) %>%
count(key_abcd, e)
But this will count the times e has been repeated per combination of a,b,c,d. I would like to instead count the times e is different per combination of a,b,c,d.
NOTE: There are both repeated combinations of values in vectors a,b,c,d and repeated values in vector e. I would like to return only the count of unique values in e for unique combinations of a,b,c,d.
Upvotes: 0
Views: 118
Reputation: 24945
You could try adding a little dplyr
on:
library(dplyr)
df %>%
unite(key_abcd, a, b, c, d) %>%
group_by(key_abcd) %>%
summarise(e = n()) %>%
filter(e>1)
Upvotes: 2