Reputation: 421
I hava a dataframe like:
User Tag
1 A TagA
2 A TagB
3 A TagC
3 B TagC
4 C TagA
5 C TagC
I want to group this Data with this new Information:
All = TagA & TagB & TagC
A_B= TagA & TagB
A_C = TagA & TagC
B_C = TagB & TagC
OnlyA = TagA
OnlyB = TagB
OnlyC = TagC
My final dataframe should looks like this:
User TagSum
1 A All
2 B OnlyC
3 C A_C
How can I do this in R?
Thanks for your help.
Upvotes: 0
Views: 75
Reputation: 7327
Considering that your initial data frame is called df
:
library(dplyr)
df %>%
arrange(Tag) %>%
group_by(User) %>%
summarise(Tag = paste(unique(Tag), collapse = ", ")) %>%
mutate(TagSum = case_when(
Tag == "TagA, TagB, TagC" ~ "All",
Tag == "TagA, TagB" ~ "A_B",
Tag == "TagA, TagC" ~ "A_C",
Tag == "TagB, TagC" ~ "B_C",
Tag == "TagA" ~ "OnlyA",
Tag == "TagB" ~ "OnlyB",
Tag == "TagC" ~ "OnlyC"
)) %>%
select(- Tag)
Result:
# A tibble: 3 x 2
User TagSum
<chr> <chr>
1 A All
2 B OnlyC
3 C A_C
Upvotes: 0
Reputation: 1456
Your data:
df <- tribble(
~User, ~Tag,
"A", "TagA",
"A", "TagB",
"A", "TagC",
"B", "TagC",
"C", "TagA",
"C", "TagC"
) %>%
as_tibble()
Create an indicator column:
df <- df %>% cbind(indicator = rep(TRUE, nrow(df))) %>% as_tibble()
Use tidyr::spread()
to create an indicator tibble, select only the indicator columns, create a long tibble with tidyr::gather
, filter only for TRUE values, select User and TagSum values:
df <- df %>%
spread(Tag, indicator, fill = FALSE) %>%
mutate(All = TagA & TagB & TagC,
A_B = TagA & TagB & !TagC,
A_C = TagA & !TagB & TagC,
B_C = !TagA & TagB & TagC,
OnlyA = TagA & !TagB & !TagC,
OnlyB = !TagA & TagB & !TagC,
OnlyC = !TagA & !TagB & TagC,
) %>%
select(-(TagA:TagC)) %>%
gather(TagSum, Value, All:OnlyC) %>%
filter(Value == TRUE) %>%
arrange(User) %>%
select(User, TagSum)
Result:
# A tibble: 3 x 2
User TagSum
<chr> <chr>
1 A All
2 B OnlyC
3 C A_C
Upvotes: 2