Gummy bears
Gummy bears

Reputation: 188

Why don't these two R lines produce the same output?

I recently started fooling around with R, and for the life of me can't figure out why these two pieces of code don't produce the same output:

 data.short[which(str_detect(data.short$name, "Miss")),]
 data.short[which(grep("Miss", data.short$name) > 1),]

From the definitions of the two functions str_detect and grep I understand, these two lines are essentially the same; filter out only the entries which contain "Miss" in their names.

The first code does exactly that. The second code, however, fails to do anything. Could someone please explain?

Upvotes: 1

Views: 74

Answers (2)

KoenV
KoenV

Reputation: 4283

# for str_detect
library(stringr) 
# some mock-up data to use
data.short <- data.frame(name = c(rep("Mister", 3), rep("Miss", 3)))

Firstly,

data.short[which(str_detect(data.short$name, "Miss")),]

returns (as expected):

[1] Miss Miss Miss
Levels: Miss Mister

Secondly,

data.short[which(grep("Miss", data.short$name) > 1),]

returns:

[1] Mister Mister Mister
Levels: Miss Mister

This is because the following returns

grep("Miss", data.short$name)
[1] 4 5 6

and if you subject that to a "which is larger than 1", you get:

which(grep("Miss", data.short$name) > 1)
[1] 1 2 3

finally yielding elements with index 1,2,3 (result of the last call) and not elements with index 4,5,6 which you probably intended:

data.short[which(grep("Miss", data.short$name) > 1),]
[1] Mister Mister Mister
Levels: Miss Mister

As a side note: grep has an argument valueyou can set to return the index or the value of the index:

> grep("Miss", data.short$name)
[1] 4 5 6
> grep("Miss", data.short$name, value = TRUE)
[1] "Miss" "Miss" "Miss" 

EDIT

Decomposing what happens with str_detect:

str_detects returns TRUE for those entries where the pattern is in the string

str_detect(data.short$name, "Miss")
[1] FALSE FALSE FALSE  TRUE  TRUE  TRUE

which returns the index

which(str_detect(data.short$name, "Miss"))
[1] 4 5 6

and this in turn, used as an index, returns what you expect

data.short[which(str_detect(data.short$name, "Miss")),]
[1] Miss Miss Miss
Levels: Miss Mister

I hope this helps.

Upvotes: 3

Ronak Shah
Ronak Shah

Reputation: 388862

No, these two codes are not doing the same thing.

tl;dr

These two lines of code are similar

data.short[which(str_detect(data.short$name, "Miss")),]

data.short[grep("Miss", data.short$name),]

In case, if you are interested knowing why

Let's take a reproducible example,

x <- c("one", "onetwo", "two", "threeone", "three")

Let's get the indices of the elements which has "one" in it

  • str_detect

str_detect returns TRUE/FALSE values, so if we want indices we wrap which around it

library(stringr)
which(str_detect(x, "one"))
#[1] 1 2 4

This is correct as vector elements at position 1, 2 and 4 have "one" in it.

Now, let's move to grep

grep("one", x)
#[1] 1 2 4

This already gives the expected output which you want.

However, when you are doing

grep("one", x) > 1

you are basically doing

c(1, 2, 4) > 1

which gives

[1] FALSE  TRUE  TRUE

as 2 and 4 are greater than 1.

and now you wrap which over it which gives you indices of TRUE values which is 2 and 3 in this case

which(grep("one", x) > 1)
#[1] 2 3

Upvotes: 4

Related Questions