Timothy_Goodman
Timothy_Goodman

Reputation: 421

Group Data in R

I hava a dataframe like:

   User Tag
1 A TagA
2 A TagB
3 A TagC
3 B TagC
4 C TagA
5 C TagC

I want to group this Data with this new Information:

All = TagA & TagB & TagC
A_B= TagA & TagB
A_C = TagA & TagC
B_C = TagB & TagC
OnlyA = TagA
OnlyB = TagB
OnlyC = TagC

My final dataframe should looks like this:

   User TagSum
1 A All
2 B OnlyC
3 C A_C

How can I do this in R?

Thanks for your help.

Upvotes: 0

Views: 75

Answers (2)

prosoitos
prosoitos

Reputation: 7327

Considering that your initial data frame is called df:

library(dplyr)

df %>%
  arrange(Tag) %>% 
  group_by(User) %>%
  summarise(Tag = paste(unique(Tag), collapse = ", ")) %>%      
  mutate(TagSum = case_when(
    Tag == "TagA, TagB, TagC" ~ "All",
    Tag == "TagA, TagB" ~ "A_B",
    Tag == "TagA, TagC" ~ "A_C",
    Tag == "TagB, TagC" ~ "B_C",
    Tag == "TagA" ~ "OnlyA",
    Tag == "TagB" ~ "OnlyB",
    Tag == "TagC" ~ "OnlyC"
  )) %>%
  select(- Tag)

Result:

# A tibble: 3 x 2
  User  TagSum
  <chr> <chr> 
1 A     All   
2 B     OnlyC 
3 C     A_C   

Upvotes: 0

Marian Minar
Marian Minar

Reputation: 1456

Your data:

df <- tribble(
  ~User, ~Tag,
  "A", "TagA",
  "A", "TagB",
  "A", "TagC",
  "B", "TagC",
  "C", "TagA",
  "C", "TagC"
) %>%
  as_tibble()

Create an indicator column:

df <- df %>% cbind(indicator = rep(TRUE, nrow(df))) %>% as_tibble()

Use tidyr::spread() to create an indicator tibble, select only the indicator columns, create a long tibble with tidyr::gather, filter only for TRUE values, select User and TagSum values:

df <- df %>%
  spread(Tag, indicator, fill = FALSE) %>%
  mutate(All = TagA & TagB & TagC,
         A_B = TagA & TagB & !TagC,
         A_C = TagA & !TagB & TagC,
         B_C = !TagA & TagB & TagC,
         OnlyA = TagA & !TagB & !TagC,
         OnlyB = !TagA & TagB & !TagC,
         OnlyC = !TagA & !TagB & TagC,
  ) %>%
  select(-(TagA:TagC)) %>%
  gather(TagSum, Value, All:OnlyC) %>%
  filter(Value == TRUE) %>%
  arrange(User) %>%
  select(User, TagSum)

Result:

# A tibble: 3 x 2
  User  TagSum
  <chr> <chr> 
1 A     All   
2 B     OnlyC 
3 C     A_C 

Upvotes: 2

Related Questions