Reputation: 33
I have a dataset that looks like this
Group Company
1 A
1 XX
1 A
1 XX
2 XX
2 A
2 XX
2 B
3 B
3 B
3 XX
3 A
4 C
4 C
4 XX
4 XX
It's actually bigger with more columns and rows and values, but the problem I want to solve involves these two columns for now.
Now, what I would like to achieve is, for each Group, to replace Company in the following way:
If XX exists in a Group, together with any other unique Company value, then Company should be replaced with this unique value. Like so:
Group Company
1 A
1 XX
1 A
1 XX
Should become:
Group Company
1 A
1 A
1 A
1 A
If any Group contains more than 2 unique values for Company, including "XX", then this group's Company value should remain unchanged, like so:
Group Company
2 XX
2 A
2 XX
2 B
I am trying to achieve this using dplyr, I managed to do what I wanted but for only one value of Company, like so:
result <- df%>%
group_by(Group) %>%
mutate(final = ifelse(Company %in% c("A", "XX"), 1,2)) %>%
summarize(test = mean(final))
result$final <- ifelse(result$test== 1, "A", result$Company)
I could probably do this with a loop, but I am pretty sure this would not be the best way to do it.
Is there a clean way to do this with dplyr ? (of course other solutions are more than welcome)
The final dataframe would then look like this:
Group Company
1 A
1 A
1 A
1 A
2 XX
2 A
2 XX
2 B
3 B
3 B
3 XX
3 A
4 C
4 C
4 C
4 C
Thank you all for your time.
Upvotes: 0
Views: 215
Reputation: 11150
Here's a way with dplyr
-
df %>%
group_by(Group) %>%
mutate(
Company = case_when(
("XX" %in% Company) && (n_distinct(Company) == 2) ~ Company[Company != "XX"][1],
TRUE ~ Company
)
) %>%
ungroup()
# A tibble: 16 x 2
Group Company
<int> <chr>
1 1 A
2 1 A
3 1 A
4 1 A
5 2 XX
6 2 A
7 2 XX
8 2 B
9 3 B
10 3 B
11 3 XX
12 3 A
13 4 C
14 4 C
15 4 C
16 4 C
Upvotes: 3