Reputation: 646
suppose I have the next data frame.
table<-data.frame(col1=c("4-p","4-p 1.0","2-p","4-p 1.6","2-p 1.0"),col2=c("4-p 1.0","2-p 1.0","1.6 2-p","4-p 1.8","1.0 2-p civic"), p_ok=c("Y","N","Y","Y","Y"), n_ok=c("N","Y","N","N","Y"))
col1 col2 p_ok n_ok
4-p 4-p 1.0 Y N
4-p 1.0 2-p 1.0 N Y
2-p 1.6 2-p Y N
4-p 1.6 4-p 1.8 Y N
2-p 1.0 1.0 2-p civic Y Y
And a I have to implement a method to determinate if the columns are similar or not (p_ok and n_ok).
The rules would be, if the number plus "-p" from column 1 is equal to col2, p_ok is 'Y', else 'N'. If the other number (1.0, 1.6, 1.8), is the same in both columns, n_ok is 'Y'. Notice that the order in the string can change (look at row 5).
Bear in mind that the real data contains multiple variants of the data (2-p, 3-p, 4-p, 5-p) and (1.0,2.0,......) so regular expressions would be necessary to determinate if the columns are similar or not (p_ok and n_ok).
The rules would be, if the number plus "-p" from column 1 is equal to col2, p_ok is 'Y', else 'N'. If the other number (1.0, 1.6, 1.8), is the same in both columns, n_ok is 'Y'. Bear in mind that the real data contains multiple variantes of the data (2-p, 3-p, 4-p, 5-p) and (1.0,2.0,......) so regular expressions would be necessary in this exercise.
Can anyone help me with this?
Upvotes: 0
Views: 45
Reputation: 887851
We can do this by switching the order of the 'p' substring and numbers using sub
, then for elements that don't have numbers replace it with 0, split the string into two using strsplit
and Reduce
it to a logical matrix
by comparing the list
of matrices
. If needed, we can replace the logical matrix with Y/N
res <- Reduce(`==`, lapply(table[1:2], function(x) do.call(rbind,
strsplit(sub("^([A-z0-9-]+)\\b$", "\\1 0",
sub("^([0-9.]+)\\s+([0-9]+-p).*", "\\2 \\1", x)), " "))))
ifelse(res, "Y", "N")
# [,1] [,2]
#[1,] "Y" "N"
#[2,] "N" "Y"
#[3,] "Y" "N"
#[4,] "Y" "N"
#[5,] "Y" "Y"
Upvotes: 1