Ioannis Baltzakis
Ioannis Baltzakis

Reputation: 33

In dplyr groups, how can I select a group that has a combination of a specific value with only ONE other of multiple values

I have a dataset that looks like this

Group  Company
1      A
1      XX
1      A
1      XX
2      XX
2      A
2      XX
2      B
3      B
3      B
3      XX
3      A
4      C
4      C
4      XX
4      XX

It's actually bigger with more columns and rows and values, but the problem I want to solve involves these two columns for now.

Now, what I would like to achieve is, for each Group, to replace Company in the following way:

If XX exists in a Group, together with any other unique Company value, then Company should be replaced with this unique value. Like so:

Group  Company
1      A
1      XX
1      A
1      XX

Should become:

Group  Company
1      A
1      A
1      A
1      A

If any Group contains more than 2 unique values for Company, including "XX", then this group's Company value should remain unchanged, like so:

Group  Company
2      XX
2      A
2      XX
2      B

I am trying to achieve this using dplyr, I managed to do what I wanted but for only one value of Company, like so:

result <- df%>%
group_by(Group) %>%
mutate(final = ifelse(Company %in% c("A", "XX"), 1,2)) %>% 
summarize(test = mean(final))

result$final <- ifelse(result$test== 1, "A", result$Company)

I could probably do this with a loop, but I am pretty sure this would not be the best way to do it.

Is there a clean way to do this with dplyr ? (of course other solutions are more than welcome)

The final dataframe would then look like this:

Group  Company
1      A
1      A
1      A
1      A
2      XX
2      A
2      XX
2      B
3      B
3      B
3      XX
3      A
4      C
4      C
4      C
4      C

Thank you all for your time.

Upvotes: 0

Views: 215

Answers (1)

Shree
Shree

Reputation: 11150

Here's a way with dplyr -

df %>%
  group_by(Group) %>%
  mutate(
    Company = case_when(
                ("XX" %in% Company) && (n_distinct(Company) == 2) ~ Company[Company != "XX"][1], 
                TRUE ~ Company
              )
  ) %>% 
  ungroup()

# A tibble: 16 x 2
   Group Company
   <int> <chr>  
 1     1 A      
 2     1 A      
 3     1 A      
 4     1 A      
 5     2 XX     
 6     2 A      
 7     2 XX     
 8     2 B      
 9     3 B      
10     3 B      
11     3 XX     
12     3 A      
13     4 C      
14     4 C      
15     4 C      
16     4 C 

Upvotes: 3

Related Questions