Reputation: 1801
I'm trying to use ddply
(a plyr
function) to sort and identify the most frequent interaction type between any unique pairs of user from a social media data of the following form
from <- c('A', 'A', 'A', 'B', 'B', 'B', 'B', 'C', 'C', 'C', 'C', 'D', 'D', 'D', 'D')
to <- c('B', 'B', 'D', 'A', 'C', 'C', 'D', 'A', 'D', 'B', 'A', 'B', 'B', 'A', 'C')
interaction_type <- c('like', 'comment', 'share', 'like', 'like', 'like', 'comment', 'like', 'like', 'share', 'like', 'comment', 'like', 'share', 'like')
dat <- data.frame(from, to, interaction_type)
which, if aggregate correctly, should find the most common type of interaction between any unique pairs (regardless of directionality (i.e., A-->B, A<--B)) like this
from to type
A B like
A C like
A D share
B C like
B D comment
C D like
While it's easy to get the total count of interaction between any two users by using
count <- ddply(sub_test, .(from, to), nrow)
I found it hard to apply similar method to find the most common type of interaction between any given pairs with this aggregation method. What will be the most efficient way to achieve my desired output? Also, how to handle possible "tied" cases? (I might just use "tided" as the cell values for all tied cases).
Upvotes: 1
Views: 191
Reputation: 3183
Similar to Ronak's approach
library(dplyr)
dat <- data.frame(from, to, interaction_type, stringsAsFactors = F)
dat %>%
mutate(
pair = purrr::pmap_chr(
.l = list(from = from, to = to),
.f = function(from, to) paste(sort(c(from, to)), collapse = "")
)
) %>%
group_by(pair) %>%
filter(n() == max(n()) & row_number() == 1) %>%
ungroup() %>%
select(-pair)
# A tibble: 6 x 3
from to interaction_type
<chr> <chr> <chr>
1 A B like
2 A D share
3 B C like
4 B D comment
5 C A like
6 C D like
Upvotes: 2
Reputation: 388807
We need to find the most common value (mode) per group irrespective of order of columns from
, to
.
Taking the Mode
function from this answer
Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
We can use dplyr
to get first appearing maximum value for the group.
library(dplyr)
dat %>%
mutate(key = paste0(pmin(from, to), pmax(from, to), sep = "")) %>%
group_by(key) %>%
mutate(interaction_type = Mode(interaction_type)) %>%
slice(1) %>%
ungroup() %>%
select(-key)
# from to interaction_type
# <chr> <chr> <chr>
#1 A B like
#2 C A like
#3 A D share
#4 B C like
#5 B D comment
#6 C D like
Kept columns as characters by adding stringsAsFactors = FALSE
in your data.
Upvotes: 2