GonzaloReig
GonzaloReig

Reputation: 87

grepl for finding words

I am trying in R to find the spanish words in a number of words. I have all the spanish words from a excel that I don´t know how to attach in the post (it has more than 80000 words), and I am trying to check if some words are on it, or not.

For example:

words = c("Silla", "Sillas", "Perro", "asdfg")

I tried to use this solution:

grepl(paste(spanish_words, collapse = "|"), words) 

But there is too much spanish words, and gives me this error:

Error

So... who can i do it? I also tried this:

toupper(words) %in% toupper(spanish_words)

Result

As you can see with this option only gives TRUE in exactly matches, and I need that "Sillas" also appear as TRUE (it is the plural word of silla). That was the reason that I tried first with grepl, for get plurals aswell.

Any idea?

Upvotes: 0

Views: 545

Answers (1)

tvdo
tvdo

Reputation: 151

As df:

df <- tibble(text = c("some words", 
                      "more words", 
                      "Perro", 
                      "And asdfg", 
                      "Comb perro and asdfg"))

Vector of words: words <- c("Silla", "Sillas", "Perro", "asdfg") words <- tolower(paste(words, collapse = "|"))

Then use mutate and str_detect:

df %>% 
  mutate(
   text = tolower(text), 
   spanish_word = str_detect(text, words)
 )

Returns:

text                 spanish_word
  <chr>                <lgl>       
1 some words           FALSE       
2 more words           FALSE       
3 perro                TRUE        
4 and asdfg            TRUE        
5 comb perro and asdfg TRUE    

Upvotes: 1

Related Questions