Raquel Feltrin
Raquel Feltrin

Reputation: 147

filter with a list of string conditions

This is an example what the data looks like:

height <- c("T_0.1", "T_0.2", "T_0.3", "T_0.11", "T_0.12", "T_0.13", "T_10.1", "T_10.2",  
"T_10.3", "T_10.11", "T_10.12", "T_10.13","T_36.1", "T_36.2", "T_36.3", "T_36.11", "T_36.12", 
"T_36.13")
value <- c(1,12,14,15,20,22,5,9,4,0.0,0.45,0.7,1,2,7,100,9,45)

df <- data.frame(height,value)

I want to filter all the values in height that ends with ".1", ".2", and ".3". However I want to do that using a "list of patterns" because the actual data frame has more than 1000 values.

Here what I tried:

vars_list <- c(".1", ".2",".3")

df_new<-df[grepl(paste(vars_list, collapse = "|"), df$height),]

matchPattern <- paste(vars_list, collapse = "|")
df_new <- df %>% select(matches(matchPattern))

Both codes returns 0 observation. I am not sure what it is the issue and I couldn't find a post that would help. So any help is very much appreciated!

Upvotes: 1

Views: 66

Answers (2)

jkatam
jkatam

Reputation: 3447

Alternatively use the base function endsWith

df <- data.frame(height,value) %>% filter(endsWith(height,vars_list))

Created on 2023-02-12 with reprex v2.0.2

  height value
1  T_0.1     1
2  T_0.2    12
3  T_0.3    14
4 T_10.1     5
5 T_10.2     9
6 T_10.3     4
7 T_36.1     1
8 T_36.2     2
9 T_36.3     7

Upvotes: 3

SamR
SamR

Reputation: 20240

The dot is a regex metacharacter, which matches any character except a new line. You need to escape it (i.e. tell R you are looking for a literal dot), by prepending it with \\.

However, your pattern will then match all rows in your sample data.

I assume you do not want to match, for example, "T_0.13", because it does not end with ".1", ".2" or ".3". In which case, you should add a dollar sign to indicate that you want your string to end with the desired match, rather than just contain it.

vars_list <- c("\\.1$", "\\.2$","\\.3$")

df_new<-df[grepl(paste(vars_list, collapse = "|"), df$height),]
df_new
#    height value
# 1   T_0.1     1
# 2   T_0.2    12
# 3   T_0.3    14
# 7  T_10.1     5
# 8  T_10.2     9
# 9  T_10.3     4
# 13 T_36.1     1
# 14 T_36.2     2
# 15 T_36.3     7

Incidentally, another way you could express this is:

df[grepl("\\.[1-3]$", df$height),]

You can read more here about the syntax used in regular expressions.

Upvotes: 4

Related Questions