jgozal
jgozal

Reputation: 1582

Counting the times a value in a vector is different per combination of 4 other vectors

This is what my dataframe looks like:

a <- c(1,1,4,4,5)
b <- c(1,2,3,3,5)
c <- c(1,4,4,4,5)
d <- c(2,2,4,4,5)
e <- c(1,5,3,2,5)

df <- data.frame(a,b,c,d,e)

I'd like to write something that returns all unique instances of vectors a,b,c,d that have a different value in vector e.

For example:

  a b c d e 
1 1 1 1 2 1 
2 1 2 4 2 5 
3 4 3 4 4 3 
4 4 3 4 4 2 
5 5 5 5 5 5 

Rows 3 and 4 are exactly the same till vector d (having a combination of 4344) so only one instance of those should be returned, but they have 2 different values in vector e. I would want to get a count on those - so the combination of 4344 has 2 different values in vector e.

The expected output would tell me how many times a certain combination such as 4344 had different values in vector e. So in this case it would be something like:

a b c d   e
4 3 4 4   2

So far I have something like this:

library(tidyr)
library(dplyr)

df %>%
  unite(key_abcd, a, b, c, d) %>%
  count(key_abcd, e)

But this will count the times e has been repeated per combination of a,b,c,d. I would like to instead count the times e is different per combination of a,b,c,d.

NOTE: There are both repeated combinations of values in vectors a,b,c,d and repeated values in vector e. I would like to return only the count of unique values in e for unique combinations of a,b,c,d.

Upvotes: 0

Views: 118

Answers (1)

jeremycg
jeremycg

Reputation: 24945

You could try adding a little dplyr on:

library(dplyr)

df %>%
  unite(key_abcd, a, b, c, d) %>%
  group_by(key_abcd) %>%
  summarise(e = n()) %>%
  filter(e>1)

Upvotes: 2

Related Questions