Reputation: 43
I'm trying to match a partial pattern of the variable names in my data set and replace them all with another pattern using gsubfn()
.
I'm using R version 4.0.3 (2020-10-10).
The below code shows the sample pattern of variable names in the data set and how I tried to replace them
replace_str = c("Race..American.India", "Race.White")
gsubd_str = gsubfn(pattern = "Race..| Race.", "R_", x = replace_str)
When I used the pattern string as above, my output is:
> gsubd_str
[1] "R_American.India" "R_hite"
However, if I use (I changed the order of pattern to match):
gsubd_str = gsubfn(pattern = "Race.| Race..", "R_", x = replace_str)
then my output is:
gsubd_str
[1] "R_.American.India" "R_White"
In both the cases, my thoughts are that gsubfn()
is not behaving as expected.
At least in the second case, gsubfn()
replaced the variable as soon as the LHS of "|"
was TRUE
.
However, in the first case, after the match was found, gsubfn()
replaced 3 characters "R"
, "."
, "W"
instead of 2, "R"
and "."
.
Not sure if I understood gsubfun()
correctly.
Upvotes: 0
Views: 90
Reputation: 4841
It is the space you added. The behavior of gsubfn
is exactly like gsub
as the documentation states:
# with the space
x <- c("Race..American.India", "Race.White")
gsub("Race..| Race.", "R_", x)
#R> [1] "R_American.India" "R_hite"
gsub("Race.| Race..", "R_", x)
#R> [1] "R_.American.India" "R_White"
# without the space
gsub("Race..|Race.", "R_", x)
#R> [1] "R_American.India" "R_hite"
gsub("Race.|Race..", "R_", x)
#R> [1] "R_American.India" "R_hite"
gsubfn("Race..|Race.", "R_", x)
#R> [1] "R_American.India" "R_hite"
gsubfn("Race..|Race.", "R_", x)
#R> [1] "R_American.India" "R_hite"
Though, you can just do:
gsub("Race..?", "R_", x)
#R> [1] "R_American.India" "R_hite"
You also might like to use \\.
. Otherwise, you may end up strange results like:
gsub("Race..?", "R_", c("Racehorses", "Racecourse", "Racerunner"))
#R> [1] "R_rses" "R_urse" "R_nner"
gsub("Race\\.\\.?", "R_", c("Racehorses", "Racecourse", "Racerunner"))
#R> [1] "Racehorses" "Racecourse" "Racerunner"
# still works
gsub("Race\\.\\.?", "R_", x)
#R> [1] "R_American.India" "R_White"
In both the cases, my thoughts are that
gsubfn()
is not behaving as expected. ...
Yes, this seems like an issue with gsubfn
. It works with gsub
as shown below. A work around is to change the regular expression to "Race..?"
:
# works fine w/ gsub
x <- c("Race..American.India", "Race.White")
gsub("Race..| Race.", "R_", x)
#R> [1] "R_American.India" "R_hite"
gsub("Race.|Race..", "R_", x)
#R> [1] "R_American.India" "R_hite"
# does not work with gsubfn
library(gsubfn)
gsubfn("Race..| Race.", "R_", x)
#R> [1] "R_American.India" "R_hite"
gsubfn("Race.| Race..", "R_", x)
#R> [1] "R_.American.India" "R_White"
# you can do
gsubfn("Race..?", "R_", x)
#R> [1] "R_American.India" "R_hite"
It is clearly stated in the manual page of gsubfn
that:
If replacement is a string then it acts like gsub.
Thus, this must be a bug or maybe this is the catch from the documentation:
Note that if the "R" engine is used and if backref is non-negative then internally the pattern will be parenthesized.
Upvotes: 2