titeuf
titeuf

Reputation: 163

matches and contains behaves differently for "." (dplyr)

I do not really understand why these two dplyr functions behave differently? Why is matches including a "wrong" variable?


# create dataframe
df <- data.frame(agr_barriers.crop_disease = 1, 
                 agr_barriers.lack_mats = 1, 
                 agr_barriers.sickness = 1, 
                 agr_barriers_mats.dontknow = 1)

###1 select dataframe
df2 <-  df %>%
  select(contains("agr_barriers."))
colnames(df2)


###2 select dataframe
df3 <-  df %>%
  select(matches("agr_barriers."))
colnames(df3)




Upvotes: 0

Views: 40

Answers (1)

Karthik S
Karthik S

Reputation: 11584

Because "matches(): Matches a regular expression." So the (.) dot in "agr_barriers." matches any single character. That's why it's giving agr_barriers_mats.dontknow.

For ex:

> df <- data.frame(agr_barriers.crop_disease = 1, 
+                  agr_barriers.lack_mats = 1, 
+                  agr_barrierss.sickness = 1, 
+                  agr_barriers_mats.dontknow = 1)
> ###1 select dataframe
> df2 <-  df %>%
+   select(contains("agr_barriers."))
> colnames(df2)
[1] "agr_barriers.crop_disease" "agr_barriers.lack_mats"   
> ###2 select dataframe
> df3 <-  df %>%
+   select(matches("agr_barriers."))
> colnames(df3)
[1] "agr_barriers.crop_disease"  "agr_barriers.lack_mats"     "agr_barrierss.sickness"     "agr_barriers_mats.dontknow"
> 

I added an extra 's' to third column in df. now when you do select(contains("agr_barriers.")) , the result doesn't have sickness column as seen in colnames(df2).

Upvotes: 3

Related Questions