Reputation: 188
I recently started fooling around with R
, and for the life of me can't figure out why these two pieces of code don't produce the same output:
data.short[which(str_detect(data.short$name, "Miss")),]
data.short[which(grep("Miss", data.short$name) > 1),]
From the definitions of the two functions str_detect
and grep
I understand, these two lines are essentially the same; filter out only the entries which contain "Miss" in their names.
The first code does exactly that. The second code, however, fails to do anything. Could someone please explain?
Upvotes: 1
Views: 74
Reputation: 4283
# for str_detect
library(stringr)
# some mock-up data to use
data.short <- data.frame(name = c(rep("Mister", 3), rep("Miss", 3)))
Firstly,
data.short[which(str_detect(data.short$name, "Miss")),]
returns (as expected):
[1] Miss Miss Miss
Levels: Miss Mister
Secondly,
data.short[which(grep("Miss", data.short$name) > 1),]
returns:
[1] Mister Mister Mister
Levels: Miss Mister
This is because the following returns
grep("Miss", data.short$name)
[1] 4 5 6
and if you subject that to a "which is larger than 1", you get:
which(grep("Miss", data.short$name) > 1)
[1] 1 2 3
finally yielding elements with index 1,2,3 (result of the last call) and not elements with index 4,5,6 which you probably intended:
data.short[which(grep("Miss", data.short$name) > 1),]
[1] Mister Mister Mister
Levels: Miss Mister
As a side note: grep
has an argument value
you can set to return the index or the value of the index:
> grep("Miss", data.short$name)
[1] 4 5 6
> grep("Miss", data.short$name, value = TRUE)
[1] "Miss" "Miss" "Miss"
Decomposing what happens with str_detect
:
str_detects
returns TRUE for those entries where the pattern is in the string
str_detect(data.short$name, "Miss")
[1] FALSE FALSE FALSE TRUE TRUE TRUE
which
returns the index
which(str_detect(data.short$name, "Miss"))
[1] 4 5 6
and this in turn, used as an index, returns what you expect
data.short[which(str_detect(data.short$name, "Miss")),]
[1] Miss Miss Miss
Levels: Miss Mister
I hope this helps.
Upvotes: 3
Reputation: 388862
No, these two codes are not doing the same thing.
tl;dr
These two lines of code are similar
data.short[which(str_detect(data.short$name, "Miss")),]
data.short[grep("Miss", data.short$name),]
In case, if you are interested knowing why
Let's take a reproducible example,
x <- c("one", "onetwo", "two", "threeone", "three")
Let's get the indices of the elements which has "one" in it
str_detect
str_detect
returns TRUE
/FALSE
values, so if we want indices we wrap which
around it
library(stringr)
which(str_detect(x, "one"))
#[1] 1 2 4
This is correct as vector elements at position 1, 2 and 4 have "one" in it.
Now, let's move to grep
grep("one", x)
#[1] 1 2 4
This already gives the expected output which you want.
However, when you are doing
grep("one", x) > 1
you are basically doing
c(1, 2, 4) > 1
which gives
[1] FALSE TRUE TRUE
as 2 and 4 are greater than 1.
and now you wrap which
over it which gives you indices of TRUE
values which is 2 and 3 in this case
which(grep("one", x) > 1)
#[1] 2 3
Upvotes: 4