Reputation: 1071
I have a tibble and want to select only those columns that contain at least one value that matches a regular expression. It took me a while to figure out how to do this, so I'm sharing my solution here.
My use case: I want to select only those columns that include media filenames, from a tibble like the one below. Importantly, I don't know ahead of time what columns the tibble consists of, and whether or not there are any columns that include media filenames.
condition | picture | sound | video | description |
---|---|---|---|---|
A | cat.png | meow.mp3 | cat.mp4 | A cat |
A | dog.png | woof.mp3 | dog.mp4 | A dog |
B | NA | NA | NA | NA |
B | bird.png | tjirp.mp3 | tjirp.mp4 | A bird |
R code to reproduce tibble:
dat = structure(list(condition = c("A", "A", "B", "B"), picture = c("cat.png",
"dog.png", NA, "bird.png"), sound = c("meow.mp3", "woof.mp3",
NA, "tjirp.mp3"), video = c("cat.mp4", "dog.mp4", NA, "tjirp.mp4"
), description = c("A cat", "A dog", NA, "A bird")), row.names = c(NA,
-4L), class = c("tbl_df", "tbl", "data.frame"))
Upvotes: 1
Views: 453
Reputation: 1071
Solution:
> dat %>% select_if(~any(grepl("\\.png|\\.mp3|\\.mp4", .)))
# A tibble: 4 x 3
picture sound video
<chr> <chr> <chr>
1 cat.png meow.mp3 cat.mp4
2 dog.png woof.mp3 dog.mp4
3 NA NA NA
4 bird.png tjirp.mp3 tjirp.mp4
Upvotes: 2