user5249203
user5249203

Reputation: 4648

Regular expression for validating input string in R

I am trying to write a regular expression in R to validate user input and run program accordingly. 3 types of queries are expected, all are character vectors.

query1 = "Oct4[Title/Abstract] AND BCR-ABL1[Title/Abstract] 
         AND stem cells[Title] AND (2000[PDAT] :2015[PDAT])"
query2 <-c("26527521","26711930","26314551")

The following code works. But the challenge is restricting special characters in both the cases

all(grepl("[A-Za-z]+",query,perl=TRUE)) validates False for query 2 

or as @sebkopf suggested

all(grepl("^[0-9 ,]+$", query)) # evaluates to TRUE only for query 2

However, query 1 also takes in year as input, which means it numeric input should be accepted for query 1. To add complexity, space , . - [] () are allowed in query1. And, the format for query2, Should be ONLY numbers, separated by , or space. Anything else should throw an error.
How to incorporate both these conditions as part of R regular expression ? So that, the following if conditions are validated accordingly to run respective codes ?

 if (grepl("regex for query 1& 2",query,perl=TRUE) == True {
 Run code 1
} else { print ("these characters are not allowed @ ! & % # * ~ `_ = +") }  
 if (grepl("regex for query3",query,perl=TRUE) == True {
 Run code 2
} else { print ("these characters are not allowed @ ! & % # * ~ `_ = + [] () - . ")}

Upvotes: 2

Views: 809

Answers (1)

sebkopf
sebkopf

Reputation: 2375

In your current regexps you are just looking for the occurence of the pattern ("[A-Za-z]+") anywhere in the query. If you want to specifically only allow certain character patterns, you need to make sure it matches across the whole query using "^...$".

With regular expressions there's always multiple ways of doing anything but to provide an example for matching a query without specific special characters (but everything else allowed), you could use the following (here wrapped in all to account for your query3 being a vector):

all(grepl("^[^@!&%#*~`_=+]+$", query)) # evaluates to TRUE for your query1, 2 & 3

For instead doing the positive match to only catch queries that are numbers plus space and comma:

all(grepl("^[0-9 ,]+$", query)) # evaluates to TRUE only for query3

Upvotes: 1

Related Questions