Reputation: 4648
I am trying to write a regular expression in R to validate user input and run program accordingly. 3 types of queries are expected, all are character vectors.
query1 = "Oct4[Title/Abstract] AND BCR-ABL1[Title/Abstract]
AND stem cells[Title] AND (2000[PDAT] :2015[PDAT])"
query2 <-c("26527521","26711930","26314551")
The following code works. But the challenge is restricting special characters in both the cases
all(grepl("[A-Za-z]+",query,perl=TRUE)) validates False for query 2
or as @sebkopf suggested
all(grepl("^[0-9 ,]+$", query)) # evaluates to TRUE only for query 2
However, query 1 also takes in year as input, which means it numeric input should be accepted for query 1. To add complexity, space , . - [] ()
are allowed in query1. And, the format for query2, Should be ONLY numbers, separated by , or space
. Anything else should throw an error.
How to incorporate both these conditions as part of R regular expression ? So that, the following if conditions
are validated accordingly to run respective codes ?
if (grepl("regex for query 1& 2",query,perl=TRUE) == True {
Run code 1
} else { print ("these characters are not allowed @ ! & % # * ~ `_ = +") }
if (grepl("regex for query3",query,perl=TRUE) == True {
Run code 2
} else { print ("these characters are not allowed @ ! & % # * ~ `_ = + [] () - . ")}
Upvotes: 2
Views: 809
Reputation: 2375
In your current regexps you are just looking for the occurence of the pattern ("[A-Za-z]+"
) anywhere in the query. If you want to specifically only allow certain character patterns, you need to make sure it matches across the whole query using "^...$"
.
With regular expressions there's always multiple ways of doing anything but to provide an example for matching a query without specific special characters (but everything else allowed), you could use the following (here wrapped in all
to account for your query3
being a vector):
all(grepl("^[^@!&%#*~`_=+]+$", query)) # evaluates to TRUE for your query1, 2 & 3
For instead doing the positive match to only catch queries that are numbers plus space and comma:
all(grepl("^[0-9 ,]+$", query)) # evaluates to TRUE only for query3
Upvotes: 1