Elli Schr
Elli Schr

Reputation: 21

Match a substring exactly in a table of strings

I have a table with many plant names, it looks like this:

|Parmelia sulcata, Xanthoria parietina, Lecanora muralis|
|Lecanora muralis var. saxicola, Lecanora hagenii|

I want to search a species in there e.g. Lecanora muralis (sp<-"Lecanora muralis").

Currently, I search through the table with a for-loop.

for(g in 1:nrow(table))
{  
  such_syn<-grep(sp,table[g,5])

  if(length(such_synspalte)>0)
  {
    syn<-table[g,5]

    selbe<-which(sp == syn)
    if (length(selbe)>0)
    {....................}
  }
}

I want to match my species "Lecanora muralis" exactly.

I have tried:

With grep it will match row 1 (thats ok) and row 2 (thats not ok, because this is variable is saxicola)

I tried it with which but syn is a character looks like this

syn <- "Parmelia sulcata, Xanthoria parietina, Lecanora muralis"

and which doesn't work.

Then I tried it with strsplit(syn,",")

syn<-c("Parmelia sulcata" " Xanthoria parietina" " Lecanora muralis")

But there are spaces in there and so the problem begins again.

And I cannot remove the spaces with gsub because all strings are then together.

How can I match my species?

Upvotes: 1

Views: 74

Answers (1)

Spacedman
Spacedman

Reputation: 94172

Split it on the comma, trim off the whitespace, do an equality test:

Test with variant:

> require(stringr) # install this handy string-processing package if you don't have it
> syn <- "Parmelia sulcata, Xanthoria parietina, Lecanora muralis var foo"

Doesn't match:

> any("Lecanora muralis" == str_trim(str_split(syn,",")[[1]]))
[1] FALSE

Without variant, returns TRUE:

> syn <- "Parmelia sulcata, Xanthoria parietina, Lecanora muralis"
> any("Lecanora muralis" == str_trim(str_split(syn,",")[[1]]))
[1] TRUE

Try with some spaces and extra stuff, still TRUE:

> syn <- "Parmelia sulcata, Xanthoria parietina, Lecanora muralis ,something else"
> any("Lecanora muralis" == str_trim(str_split(syn,",")[[1]]))
[1] TRUE

Write it as a function for neatness:

> exmatch = function(target, clist){any(target == str_trim(str_split(clist,",")[[1]]))}
> exmatch("Lecanora muralis", syn)
[1] TRUE
> exmatch("Lecanora muralis var foo", syn)
[1] FALSE

This also means that when you get a better answer here, make sure they call their function exmatch and you can replace the definition without having to rewrite all your code.

Upvotes: 1

Related Questions