lokheart
lokheart

Reputation: 24685

subsetting dataframe in R using two criteria, one of them is regular expression

I have a dataset something like this:

col_a col_b    col_c
1     abc_boy  1
2     abc_boy  2
1     abc_girl 1
2     abc_girl 2

I need to pick up the first row only based on col_b and col_c, and then change the valye in col_c, which is something like this:

df[grep("_boy$",df[,"col_b"]) & df[,"col_c"]=="1","col_c"] <- "yes"

But the code above is not OK, since the first criteria and the second criteria do not originate from the same set.

I can do it in a dumb way by using a explicit loop, or do a "two-tier" subsetting, something like this:

df.a <- df[grep("_boy$",df[,"col_b"]),]              #1
df.b <- df[grep("_boy$",df[,"col_b"],invert=TRUE),]  #2
df.a <- df.a[df.a[,"col_c"]=="1","col_c"] <- "yes"   #3
df.a <- df.a[df.a[,"col_c"]=="2","col_c"] <- "no"    #4
df <- rbind(df.a,df.b)                               #5

But I prefer not to, can anyone enlighten me how to "merge" #1 and #3? Thanks.

Upvotes: 4

Views: 1751

Answers (2)

IRTFM
IRTFM

Reputation: 263481

The reason it is not working as you expected despite correct logic, is that you are using grep where you should be using grepl. Try instead:

df[ grepl("_boy$", df[,"col_b"]) & df[,"col_c"]=="1", "col_c"] <- "yes"

> df
  col_a    col_b col_c
1     1  abc_boy   yes
2     2  abc_boy     2
3     1 abc_girl     1
4     2 abc_girl     2

grepl retruns a logical vector of the length of its arguments, whereas grep returns a shorter numeric vector, so in this case gets recycled.

Upvotes: 6

rcs
rcs

Reputation: 68849

Try grepl instead of grep. grepl returns a logical vector (match or not for each element of x), which can be combined with logical operators.

Upvotes: 6

Related Questions