Residium
Residium

Reputation: 301

R - partial string matching for new variable

I have quite a big dataset which has 2 text variables A and B. length(A) <= length(B). B can be either variable A with some extra characters (without order) or can be totally different from A. So i need to to create new variable within my data table under this condition: If B contains A then C = TRUE. I believe partial string matching is more suitable for me here than normal string comparison.

My dataframe example:

Home      Pick  
Barc      Barcelona 0  
F Munch   FC munchen   
Lakers    Portland

I need to add new variable Side in this way:

Home     Pick         Side    
Barc     Barcelona 0  True  
F Munch  FC munchen   True  
Lakers   Portland     False  

i am trying to solve with this:

data_n$Side <- stringMatch(data_n$Home, data_n$Pick, normalize = "YES")

but it gives all negative results.
Hoverer

stringMatch('barcel', 'Barcelona 0', normalize='YES')    

gives needed answer. Any hints where i make mistake?

Upvotes: 1

Views: 1216

Answers (1)

Rich Scriven
Rich Scriven

Reputation: 99331

I'm not sure of its reliability, but agrepl, the partial pattern matching function, seems to work on your data. Assume dat is your original data, then

## read in the original data
> txt <- "Home\tPick
  Barc\tBarcelona 0
  F Munch\tFC munchen
  Lakers\tPortland"
> dat <- read.table(text = txt, sep = '\t', header = TRUE)
##      Home        Pick
## 1    Barc Barcelona 0
## 2 F Munch  FC munchen
## 3  Lakers    Portland

using agrepl

> d1 <- dat[,1]
> d2 <- dat[,2]
> dat$Side <- sapply(seq(nrow(dat)), function(i){
      agrepl(d1[i], d2[i], ignore.case = TRUE)
      })
> dat
##      Home        Pick  Side
## 1    Barc Barcelona 0  TRUE
## 2 F Munch  FC munchen  TRUE
## 3  Lakers    Portland FALSE

Upvotes: 1

Related Questions