user2363777
user2363777

Reputation: 1071

Select columns based on string values in column content

I have a tibble and want to select only those columns that contain at least one value that matches a regular expression. It took me a while to figure out how to do this, so I'm sharing my solution here.

My use case: I want to select only those columns that include media filenames, from a tibble like the one below. Importantly, I don't know ahead of time what columns the tibble consists of, and whether or not there are any columns that include media filenames.

condition picture sound video description
A cat.png meow.mp3 cat.mp4 A cat
A dog.png woof.mp3 dog.mp4 A dog
B NA NA NA NA
B bird.png tjirp.mp3 tjirp.mp4 A bird

R code to reproduce tibble:

dat = structure(list(condition = c("A", "A", "B", "B"), picture = c("cat.png", 
"dog.png", NA, "bird.png"), sound = c("meow.mp3", "woof.mp3", 
NA, "tjirp.mp3"), video = c("cat.mp4", "dog.mp4", NA, "tjirp.mp4"
), description = c("A cat", "A dog", NA, "A bird")), row.names = c(NA, 
-4L), class = c("tbl_df", "tbl", "data.frame"))

Upvotes: 1

Views: 453

Answers (1)

user2363777
user2363777

Reputation: 1071

Solution:

> dat %>% select_if(~any(grepl("\\.png|\\.mp3|\\.mp4", .)))
# A tibble: 4 x 3
  picture  sound     video    
  <chr>    <chr>     <chr>    
1 cat.png  meow.mp3  cat.mp4  
2 dog.png  woof.mp3  dog.mp4  
3 NA       NA        NA       
4 bird.png tjirp.mp3 tjirp.mp4

Upvotes: 2

Related Questions