lolo
lolo

Reputation: 646

Matching with regular expressions in R

suppose I have the next data frame.

table<-data.frame(col1=c("4-p","4-p 1.0","2-p","4-p 1.6","2-p 1.0"),col2=c("4-p 1.0","2-p 1.0","1.6 2-p","4-p 1.8","1.0 2-p civic"), p_ok=c("Y","N","Y","Y","Y"), n_ok=c("N","Y","N","N","Y"))

    col1          col2 p_ok n_ok
     4-p       4-p 1.0    Y    N
 4-p 1.0       2-p 1.0    N    Y
     2-p       1.6 2-p    Y    N
 4-p 1.6       4-p 1.8    Y    N
 2-p 1.0 1.0 2-p civic    Y    Y

And a I have to implement a method to determinate if the columns are similar or not (p_ok and n_ok).

The rules would be, if the number plus "-p" from column 1 is equal to col2, p_ok is 'Y', else 'N'. If the other number (1.0, 1.6, 1.8), is the same in both columns, n_ok is 'Y'. Notice that the order in the string can change (look at row 5).

Bear in mind that the real data contains multiple variants of the data (2-p, 3-p, 4-p, 5-p) and (1.0,2.0,......) so regular expressions would be necessary to determinate if the columns are similar or not (p_ok and n_ok).

The rules would be, if the number plus "-p" from column 1 is equal to col2, p_ok is 'Y', else 'N'. If the other number (1.0, 1.6, 1.8), is the same in both columns, n_ok is 'Y'. Bear in mind that the real data contains multiple variantes of the data (2-p, 3-p, 4-p, 5-p) and (1.0,2.0,......) so regular expressions would be necessary in this exercise.

Can anyone help me with this?

Upvotes: 0

Views: 45

Answers (1)

akrun
akrun

Reputation: 887851

We can do this by switching the order of the 'p' substring and numbers using sub, then for elements that don't have numbers replace it with 0, split the string into two using strsplit and Reduce it to a logical matrix by comparing the list of matrices. If needed, we can replace the logical matrix with Y/N

res <- Reduce(`==`, lapply(table[1:2], function(x) do.call(rbind, 
       strsplit(sub("^([A-z0-9-]+)\\b$", "\\1 0", 
        sub("^([0-9.]+)\\s+([0-9]+-p).*", "\\2 \\1", x)), " "))))
ifelse(res, "Y", "N")
#   [,1] [,2]
#[1,] "Y"  "N" 
#[2,] "N"  "Y" 
#[3,] "Y"  "N" 
#[4,] "Y"  "N" 
#[5,] "Y"  "Y" 

Upvotes: 1

Related Questions