user8959427
user8959427

Reputation: 2067

using dplyr to create new columns based on condition from data in R

I am trying to create new columns in my data based on the dates.

What I would like to do is to get the data in the following format:

        dates  yes_no grp event    2017-01-02     2017-01-04
1  2017-01-01     364   1    NA       NA            NA
2  2017-01-02    2451   1     1        1            NA
3  2017-01-03   32440   1    NA       NA            NA
4  2017-01-04  234341   1     1       NA             1
5  2017-01-05  123443   1    NA       NA            NA
6  2017-01-01    7273   2    NA       NA            NA
7  2017-01-02    2331   2     1        1            NA
8  2017-01-03   12332   2    NA       NA            NA
9  2017-01-04   12233   2     1       NA             1
10 2017-01-05 1232323   2    NA       NA            NA

I have 2 groups grp 1 and 2. The event happens on the 2017-01-02 and 2017-01-04 for both groups. I am trying to create a single column for each of these two dates and populate the column if the event occurs on that date.

I originally thought about using case_when and then pivot_wider using the following:

d %>% 
  mutate(event = case_when(
    dates == "2017-01-02" ~ 1,
    dates == "2017-01-04" ~ 1
  )) %>% 
  pivot_wider(names_from = dates, values_from = yes_no)

This doesn't work on my data because I run into memory issues and I do not want to create a column for each date, just the dates which has the event = 1. Alternatively it might be useful to create the columns manually and then populate the date directly.

i.e. d$2017-01-02 = 0 ifelse(dates = "2017-01-02", 2017-01-02 = 1, 0)

However I have about 30 dates I want to populate and create new columns.

Data:

   d <- data.frame(
  dates = c("2017-01-01", "2017-01-02", "2017-01-03", "2017-01-04", "2017-01-05",
            "2017-01-01", "2017-01-02", "2017-01-03", "2017-01-04", "2017-01-05"),
  yes_no = c(0364, 2451, 32440, 234341, 123443,
             7273, 2331, 12332, 12233, 1232323),
  grp = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2)
)

Upvotes: 0

Views: 230

Answers (4)

akrun
akrun

Reputation: 886928

We can also do this in base R with

dates = c("2017-01-02", "2017-01-04")
d[dates] <- NA^(sapply(dates, `!=`, d$dates))

Upvotes: 1

JBGruber
JBGruber

Reputation: 12410

Here is a (mostly) tidy solution that works well if you have more than the two dates you mention:

library(dplyr)
dates <- c("2017-01-02", "2017-01-04")
d %>% 
  {
    l <- lapply(dates, function(x) ifelse(x == .$dates, 1, NA))
    names(l) <- dates
    cbind(., l)
  } %>% 
  mutate(event = select(., starts_with("201")) %>% rowSums(na.rm = TRUE))
#>         dates  yes_no grp 2017-01-02 2017-01-04 event
#> 1  2017-01-01     364   1         NA         NA     0
#> 2  2017-01-02    2451   1          1         NA     1
#> 3  2017-01-03   32440   1         NA         NA     0
#> 4  2017-01-04  234341   1         NA          1     1
#> 5  2017-01-05  123443   1         NA         NA     0
#> 6  2017-01-01    7273   2         NA         NA     0
#> 7  2017-01-02    2331   2          1         NA     1
#> 8  2017-01-03   12332   2         NA         NA     0
#> 9  2017-01-04   12233   2         NA          1     1
#> 10 2017-01-05 1232323   2         NA         NA     0

Created on 2019-11-28 by the reprex package (v0.3.0)

If you don't like how this looks you can move the loop into a function to accomplish the same:

add_date_column <- function(tbl, dates) {
    l <- lapply(dates, function(x) ifelse(x == tbl$dates, 1, NA))
    names(l) <- dates
    cbind(tbl, l)
}
d %>% 
  add_date_column(dates) %>% 
  mutate(event = select(., starts_with("201")) %>% rowSums(na.rm = TRUE))

Upvotes: 1

Jeroen Colin
Jeroen Colin

Reputation: 345

Why wouldn't you create the date columns explicitly?

d %>% 
  mutate(event = case_when(
    dates == "2017-01-02" ~ 1,
    dates == "2017-01-04" ~ 1
  ),
  `2017-01-02` = ifelse(dates == "2017-01-02", 1, NA),
  `2017-01-04` = ifelse(dates == "2017-01-04", 1, NA),
  )

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 388797

In base R, we can use lapply to add new columns

dates = c("2017-01-02", "2017-01-04")
d[dates] <- lapply(dates, function(x) ifelse(d$dates == x, 1, NA))
d

#        dates  yes_no grp 2017-01-02 2017-01-04
#1  2017-01-01     364   1         NA         NA
#2  2017-01-02    2451   1          1         NA
#3  2017-01-03   32440   1         NA         NA
#4  2017-01-04  234341   1         NA          1
#5  2017-01-05  123443   1         NA         NA
#6  2017-01-01    7273   2         NA         NA
#7  2017-01-02    2331   2          1         NA
#8  2017-01-03   12332   2         NA         NA
#9  2017-01-04   12233   2         NA          1
#10 2017-01-05 1232323   2         NA         NA

In purrr lapply can be replaced with map

d[dates] <- purrr::map(dates, ~ifelse(d$dates == .x, 1, NA))

Upvotes: 2

Related Questions