Patrick Balada
Patrick Balada

Reputation: 1450

How can I make conditional selections using dplyr in R?

I have the following situation. Given the table

df <- data.frame(ID = c(1, 2, 2, 3, 3, 4),
             type = c("MC", "MC", "MK", "MC", "MK", "MC"),
             value1 = c(512, 261, 4523, 1004, 1221, 2556),
             value2 = c(726, 4000, 280, 998, 113, 6789))

I am trying to find a way to implement the following logic: If for an ID, both types (MC and MK) occur, use value1 from MK and value2 from MC. Otherwise (only the type MC occurs), use MC.

Hence, the final result is supposed to be:

data.frame(ID = c(1, 2, 3, 4),
             type = c("MC", "MC", "MC", "MC"),
             value1 = c(512, 4523, 1221, 2556),
             value2 = c(726, 4000, 998, 6789))

Assuming the type MK is dropped after extracting the value1.

Upvotes: 1

Views: 53

Answers (3)

Andre Elrico
Andre Elrico

Reputation: 11500

data.table solution

setDT(df1)[,{x=.SD;if(all(c("MC","MK") %in% type)){x$value1[] = last(value1)};first(x)},by=ID]

result:

#  ID type value1 value2
#1  1   MC    512    726
#2  2   MC   4523   4000
#3  3   MC   1221    998
#4  4   MC   2556   6789

dplyr:

df1 %>% group_by(ID) %>% do(.,(function(x){if(all(c("MC","MK") %in% x$type)){x$value1[] = x$value1[x$type=="MK"]};x[1,]})(.))

# A tibble: 4 x 4
# Groups:   ID [4]
#     ID type  value1 value2
#  <dbl> <fct>  <dbl>  <dbl>
#1     1 MC       512    726
#2     2 MC      4523   4000
#3     3 MC      1221    998
#4     4 MC      2556   6789

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 389335

Another version with dplyr

library(dplyr)

df %>%
  group_by(ID) %>%
  mutate(value1 = ifelse(any(type == "MK"), value1[type=="MK"],value1[type=="MC"]), 
         value2 = value2[type == "MC"]) %>%
  filter(type == "MC")

#     ID type  value1 value2
#  <dbl> <fct>  <dbl>  <dbl>
#1     1 MC       512    726
#2     2 MC      4523   4000
#3     3 MC      1221    998
#4     4 MC      2556   6789

Here, for value1 we check value in "MK" if it is present or take corresponding "MC" value instead and for value2 by default we take "MC" value and keep only rows with type "MC". This is assuming every group (ID) would have a "MC" type row.

Upvotes: 2

nghauran
nghauran

Reputation: 6778

For efficiency I would definitely prefer @Andre Elrico' answer but here is a dplyr option. Try:

df <- data.frame(ID = c(1, 2, 2, 3, 3, 4),
                 type = c("MC", "MC", "MK", "MC", "MK", "MC"),
                 value1 = c(512, 261, 4523, 1004, 1221, 2556),
                 value2 = c(726, 4000, 280, 998, 113, 6789)) 
library(dplyr)
df %>%
  reshape(., idvar = "ID", timevar = "type", direction = "wide") %>%
  group_by(ID) %>%
  mutate(value1 = ifelse(is.na(value1.MK), value1.MC, value1.MK),
         value2 = ifelse(is.na(value2.MC), value2.MK, value2.MC),
         type = "MC") %>%
  select(ID, type, value1, value2)
# output
# A tibble: 4 x 4
# Groups:   ID [4]
     ID  type value1 value2
  <dbl> <chr>  <dbl>  <dbl>
1     1    MC    512    726
2     2    MC   4523   4000
3     3    MC   1221    998
4     4    MC   2556   6789

Upvotes: 1

Related Questions