yrx1702
yrx1702

Reputation: 1641

Using grepl() in a for loop to fuzzy match

I have two data-frames that look like this:

matcher<-data.frame(matcher.nation=c("","",""),matcher.var=c("test one","test two", "example one"))
matcher <- data.frame(lapply(matcher, as.character), stringsAsFactors=FALSE)
matcher
  matcher.nation matcher.var
1                   test one
2                   test two
3                example one

and

df<-data.frame(var=c("test","example"),nation=c("AFG","BEL"))
df <- data.frame(lapply(df, as.character), stringsAsFactors=FALSE)
df
      var nation
1    test    AFG
2 example    BEL

Now I want to do some kind of fuzzy matching to fill in matcher$matcher.nation. Therefore, I've written the following loop:

for (i in length(df$var)){
  matcher$matcher.nation[grepl(paste(".*",df$var[i],".*",sep=""),
                               matcher$matcher.var)]<-df$nation[i]
}

that is supposed to iterate through df$var, compare it to matcher$matcher.var, and match df$nation to matcher$matcher.nation if the expression is found in matcher (no matter what comes before or after the expression).

If I do this, it matches just one nation:

matcher
  matcher.nation matcher.var
1                   test one
2                   test two
3            BEL example one

However, if I do it manually for i=1 (i.e. use "test" in grepl), it works perfectly fine:

matcher$matcher.nation[grepl(paste(".*","test",".*",sep=""),matcher$matcher.var)]<-"AFG"
matcher
  matcher.nation matcher.var
1            AFG    test one
2            AFG    test two
3            BEL example one

If anyone could point me in the direction of what's wrong with my loop that would be nice. Thanks!

Upvotes: 0

Views: 416

Answers (1)

user1310503
user1310503

Reputation: 577

It should be

for (i in 1:length(df$var)) {

Or, even better,

for (i in seq_along(df$var)) {

An extra tip: instead of paste(..., sep=""), you can use paste0(...).

Upvotes: 1

Related Questions