Reputation: 135
I have a dataset where multiple individuals attended 6 events. Dataset looks somewhat as follows:
Series event_1 event_2 event_3 event_4 event_5 event_6
1 Bill Alicia Alicia Bill Denise Bill
2 Will Stormy Bill Dob June Denise
3 Stormy Dave Sally Denise Alicia April
I want to create a new variable based on which of these participants attended many events no matter which series. So for instant Bill attended 4 times and Alicia attended three times and Denise attended 3 times as well. So there were at least 2 participants who attended 3 events and one participant who attended 4 times.
Could you help me with the best way of go about this?
Thanks
Upvotes: 3
Views: 75
Reputation: 102529
As per the update from the comment, you can try
transform(
aggregate(
Series ~ event,
reshape(
df,
direction = "long",
idvar = "Series",
v.names = "event",
varying = -1
), sort
),
n = lengths(Series),
CrossSeries = lengths(lapply(Series, unique))
)
which gives
event Series n CrossSeries
1 Alicia 1, 1, 3 3 2
2 April 3 1 1
3 Bill 1, 1, 1, 2 4 2
4 Dave 3 1 1
5 Denise 1, 2, 3 3 3
6 Dob 2 1 1
7 June 2 1 1
8 Sally 3 1 1
9 Stormy 2, 3 2 2
10 Will 2 1 1
Try this
> rev(stack(sort(table(unlist(df[-1])), T)))
ind values
1 Bill 4
2 Alicia 3
3 Denise 3
4 Stormy 2
5 April 1
6 Dave 1
7 Dob 1
8 June 1
9 Sally 1
10 Will 1
Upvotes: 0
Reputation: 1304
In base R you can do
as.data.frame(table(unlist(df[, -1])))
Var1 Freq
1 Alicia 3
2 April 1
3 Bill 4
4 Dave 1
5 Denise 3
6 Dob 1
7 June 1
8 Sally 1
9 Stormy 2
10 Will 1
Upvotes: 1
Reputation: 19339
library(tidyr)
library(dplyr)
pivot_longer(df, -Series) |>
count(value) |>
arrange(-n)
1 Bill 4
2 Alicia 3
3 Denise 3
4 Stormy 2
5 April 1
6 Dave 1
7 Dob 1
8 June 1
9 Sally 1
10 Will 1
df <- structure(list(Series = c(1, 2, 3), event_1 = c("Bill", "Will",
"Stormy"), event_2 = c("Alicia", "Stormy", "Dave"), event_3 = c("Alicia",
"Bill", "Sally"), event_4 = c("Bill", "Dob", "Denise"), event_5 = c("Denise",
"June", "Alicia"), event_6 = c("Bill", "Denise", "April")), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -3L))
Upvotes: 4
Reputation: 8826
df <- data.frame(
Series = c(1, 2, 3),
event_1 = c("Bill", "Will", "Stormy"),
event_2 = c("Alicia", "Stormy", "Dave"),
event_3 = c("Alicia", "Bill", "Sally"),
event_4 = c("Bill", "Dob", "Denise"),
event_5 = c("Denise", "June", "Alicia"),
event_6 = c("Bill", "Denise", "April")
)
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = starts_with("event_"),values_to = "participant") %>%
count(participant)
# A tibble: 10 × 2
participant n
<chr> <int>
1 Alicia 3
2 April 1
3 Bill 4
4 Dave 1
5 Denise 3
6 Dob 1
7 June 1
8 Sally 1
9 Stormy 2
10 Will 1
Upvotes: 3