Reputation: 2293
I have a tibble. I need to add a new column in which each value is a function of the corresponding values in several other columns. Here is an example:
library(tibble)
tmp <- tribble(
~ID, ~x1, ~x2,
1, "200", NA,
2, "300", "400")
I want to add a new column, new
, that is TRUE
if and only if any of the corresponding values in x1
and x2
start with "3". That is, I want
# A tibble: 2 x 4
ID x1 x2 new
<dbl> <chr> <chr> <lgl>
1 1 200 <NA> NA
2 2 300 400 TRUE
In this example, new
is a function of only x1
and x2
. But there may be many of these "x" columns, and I won't always be able to write out their names. They will always start with "x", though, so this is one solution:
tmp %>%
mutate(
new = select(., starts_with("x")) %>%
apply(., 1, function (x) any(substr(x, 1, 1)=="3"))
)
But this solution is pretty clunky. Is there a more elegant way?
There are many related questions on Stack Overflow, but they generally speak to cases in which (a) the names of all columns in the original dataset are known and can be written out, or (b) the new
variable is a function of all other columns in the data frame. (Here is one example.)
Upvotes: 3
Views: 112
Reputation: 886938
Here is an option with pivot_longer
where we reshape into 'long' format with pivot_longer
, do a group by 'ID' to check if there are any
value that have 3 as the first digit and do a join with the original dataset
library(dplyr)
library(tidyr)
library(stringr)
tmp %>%
pivot_longer(cols = -ID, values_drop_na = TRUE) %>%
group_by(ID) %>%
summarise(new = any(str_detect(value, '^3'))) %>%
right_join(tmp)
# A tibble: 2 x 4
# ID new x1 x2
#* <dbl> <lgl> <chr> <chr>
#1 1 FALSE 200 <NA>
#2 2 TRUE 300 400
Or using base R
, we can concatenate by row with paste
and use grepl
. Should be more efficient
grepl("(^|,)3", do.call(paste, c(tmp[-1], sep=",")))
#[1] FALSE TRUE
Upvotes: 1
Reputation: 388797
If you want to stay in tidyverse
, we can use pmap
for a row-wise operation :
library(dplyr)
library(purrr)
tmp %>%
mutate(new = pmap_lgl(select(., starts_with('x')),
~any(startsWith(c(...), '3'), na.rm = TRUE)))
# ID x1 x2 new
# <dbl> <chr> <chr> <lgl>
#1 1 200 NA FALSE
#2 2 300 400 TRUE
In base R, we can use row-wise apply
tmp$new <- apply(tmp[grep('x', names(tmp))], 1, function(x)
any(startsWith(x, '3'), na.rm = TRUE))
Upvotes: 1