Reputation: 25
I have data that looks like this:
id a b c
1 346 BBX BBX RRS
2 677 BBX TFT TFT
3 999 BBX CAT CAT
What I want to do is: for each row, find the value across a,b,c that has the most matches. So the result would look like this:
id y
1 346 BBX
2 677 TFT
3 999 CAT
If there are cases where there is no column with more matches than any other, I am indifferent to which value is chosen.
Any suggestions?
I also got as far as transforming the data in the manner below, and creating a variable "n", for the number of occurrences of each x. But not sure how "select" the x with highest n for each id.
id x n
1 346 BBX 2
2 346 RRS 1
3 677 BBX 1
4 677 TFT 2
5 999 BBX 1
6 999 CAT 2
Here's some code the data:
id <- c("346", "677", "999")
a <- rep ("BBX", 3)
b <- c("BBX", "TFT", "CAT")
c <- c("RRS", "TFT", "CAT")
x <- data.frame(id, a,b,c, stringsAsFactors=F)
or
id <- c("346", "346", "677", "677", "999", "999")
x <- c("BBX", "RRS", "BBX", "TFT", "BBX", "CAT")
n <- c("2", "1", "1", "2", "1", "2")
bl <- data.frame(id, x, n)
THANK YOU!
Upvotes: 1
Views: 122
Reputation: 46888
If you have already pivot it longer, you can use top_n, like this:
x %>% pivot_longer(-id) %>% group_by(id) %>%
count(value) %>% arrange(id,-n) %>%
filter(!duplicated(id))
# A tibble: 3 x 3
# Groups: id [3]
id value n
<chr> <chr> <int>
1 346 BBX 2
2 677 TFT 2
3 999 CAT 2
If you have NAs, for example, you can try to filter them out:
x[1,3:4]= NA
x %>% pivot_longer(-id) %>%
group_by(id) %>% filter(!is.na(value)) %>%
count(value) %>% arrange(id,-n) %>%
filter(!duplicated(id))
Or in base R, you can apply a sort(table(..)) function row-wise:
most_freq = apply(x[,-1],1,function(i)names(sort(-table(i)))[1])
data.frame(id=x$id,most_freq)
id most_freq
1 346 BBX
2 677 TFT
3 999 CAT
Upvotes: 1