Reputation: 2067
I am trying to create new columns in my data based on the dates.
What I would like to do is to get the data in the following format:
dates yes_no grp event 2017-01-02 2017-01-04
1 2017-01-01 364 1 NA NA NA
2 2017-01-02 2451 1 1 1 NA
3 2017-01-03 32440 1 NA NA NA
4 2017-01-04 234341 1 1 NA 1
5 2017-01-05 123443 1 NA NA NA
6 2017-01-01 7273 2 NA NA NA
7 2017-01-02 2331 2 1 1 NA
8 2017-01-03 12332 2 NA NA NA
9 2017-01-04 12233 2 1 NA 1
10 2017-01-05 1232323 2 NA NA NA
I have 2 groups grp
1 and 2. The event happens on the 2017-01-02
and 2017-01-04
for both groups. I am trying to create a single column for each of these two dates and populate the column if the event occurs on that date.
I originally thought about using case_when
and then pivot_wider
using the following:
d %>%
mutate(event = case_when(
dates == "2017-01-02" ~ 1,
dates == "2017-01-04" ~ 1
)) %>%
pivot_wider(names_from = dates, values_from = yes_no)
This doesn't work on my data because I run into memory issues and I do not want to create a column for each date, just the dates which has the event = 1
. Alternatively it might be useful to create the columns manually and then populate the date directly.
i.e.
d$2017-01-02
= 0
ifelse(dates = "2017-01-02", 2017-01-02
= 1, 0)
However I have about 30 dates I want to populate and create new columns.
Data:
d <- data.frame(
dates = c("2017-01-01", "2017-01-02", "2017-01-03", "2017-01-04", "2017-01-05",
"2017-01-01", "2017-01-02", "2017-01-03", "2017-01-04", "2017-01-05"),
yes_no = c(0364, 2451, 32440, 234341, 123443,
7273, 2331, 12332, 12233, 1232323),
grp = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2)
)
Upvotes: 0
Views: 230
Reputation: 886928
We can also do this in base R
with
dates = c("2017-01-02", "2017-01-04")
d[dates] <- NA^(sapply(dates, `!=`, d$dates))
Upvotes: 1
Reputation: 12410
Here is a (mostly) tidy solution that works well if you have more than the two dates you mention:
library(dplyr)
dates <- c("2017-01-02", "2017-01-04")
d %>%
{
l <- lapply(dates, function(x) ifelse(x == .$dates, 1, NA))
names(l) <- dates
cbind(., l)
} %>%
mutate(event = select(., starts_with("201")) %>% rowSums(na.rm = TRUE))
#> dates yes_no grp 2017-01-02 2017-01-04 event
#> 1 2017-01-01 364 1 NA NA 0
#> 2 2017-01-02 2451 1 1 NA 1
#> 3 2017-01-03 32440 1 NA NA 0
#> 4 2017-01-04 234341 1 NA 1 1
#> 5 2017-01-05 123443 1 NA NA 0
#> 6 2017-01-01 7273 2 NA NA 0
#> 7 2017-01-02 2331 2 1 NA 1
#> 8 2017-01-03 12332 2 NA NA 0
#> 9 2017-01-04 12233 2 NA 1 1
#> 10 2017-01-05 1232323 2 NA NA 0
Created on 2019-11-28 by the reprex package (v0.3.0)
If you don't like how this looks you can move the loop into a function to accomplish the same:
add_date_column <- function(tbl, dates) {
l <- lapply(dates, function(x) ifelse(x == tbl$dates, 1, NA))
names(l) <- dates
cbind(tbl, l)
}
d %>%
add_date_column(dates) %>%
mutate(event = select(., starts_with("201")) %>% rowSums(na.rm = TRUE))
Upvotes: 1
Reputation: 345
Why wouldn't you create the date columns explicitly?
d %>%
mutate(event = case_when(
dates == "2017-01-02" ~ 1,
dates == "2017-01-04" ~ 1
),
`2017-01-02` = ifelse(dates == "2017-01-02", 1, NA),
`2017-01-04` = ifelse(dates == "2017-01-04", 1, NA),
)
Upvotes: 2
Reputation: 388797
In base R, we can use lapply
to add new columns
dates = c("2017-01-02", "2017-01-04")
d[dates] <- lapply(dates, function(x) ifelse(d$dates == x, 1, NA))
d
# dates yes_no grp 2017-01-02 2017-01-04
#1 2017-01-01 364 1 NA NA
#2 2017-01-02 2451 1 1 NA
#3 2017-01-03 32440 1 NA NA
#4 2017-01-04 234341 1 NA 1
#5 2017-01-05 123443 1 NA NA
#6 2017-01-01 7273 2 NA NA
#7 2017-01-02 2331 2 1 NA
#8 2017-01-03 12332 2 NA NA
#9 2017-01-04 12233 2 NA 1
#10 2017-01-05 1232323 2 NA NA
In purrr
lapply
can be replaced with map
d[dates] <- purrr::map(dates, ~ifelse(d$dates == .x, 1, NA))
Upvotes: 2