aa710
aa710

Reputation: 69

Regex: matching multiple patterns and getting middle of string

I am working on a code that takes a bunch of SQL queries and aims to break down the queries only into the table names.

For example I have the following queries:

delete from pear.admin where jdjdj
delete from pear.admin_user where blah
delete from ss_pear.admin_user where blah 

I am trying to get a regex that matches all these patterns, would that be through creating a list of multiple patterns first and then passing it through str_extract ?

I used a regex but it's giving me the following output:

delete from pear.admin 

how do I get rid of the first words before it? I tried (.*) but nothing seems to work.

sql_data$table_name <- 
str_extract(sql_data$Full.Sql, "[^_]+\\.[\\w]+\\_[\\w]+")

Upvotes: 4

Views: 310

Answers (5)

Sotos
Sotos

Reputation: 51592

Here is a non-regex approach using strsplit to split from from, and extract the first word from the second element, i.e.

sapply(strsplit(queries, ' from '), function(i)gsub('\\s.*', '', i[2]))
#[1] "pear.admin"         "pear.admin_user"    "ss_pear.admin_user"

NOTE: This will work If your table names DO NOT (and should not) have spaces in them

Upvotes: 3

Ronak Shah
Ronak Shah

Reputation: 388982

Here is an option with qdapRegex::ex_between with no regex

qdapRegex::ex_between(x, "from", "where")

#[[1]]
#[1] "pear.admin"

#[[2]]
#[1] "pear.admin_user"

#[[3]]
#[1] "ss_pear.admin_user"

data

x <- c("delete from pear.admin where jdjdj", 
       "delete from pear.admin_user where blah", 
       "delete from ss_pear.admin_user where blah")

Upvotes: 0

Jan
Jan

Reputation: 43169

You may use

gsub(".*?from\\s(\\S+).*","\\1", sql.data$Full.Sql)

See a demo on regex101.com.

Upvotes: 0

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521289

I am only familiar with the base R regex functions, so here is an option using sub:

queries <- c("delete from pear.admin where jdjdj",
             "delete from pear.admin_user where blah",
             "delete from ss_pear.admin_user where blah")

table_names <- sapply(queries, function(x) {
    sub(".*\\bfrom\\s+(\\S+).*", "\\1", x)
})
table_names

           1                    2                    3 
"pear.admin"    "pear.admin_user" "ss_pear.admin_user" 

This should perform at least somewhat reliably, since, as far as I know, what immediately followed the keyword FROM must be a table name.

Upvotes: 4

boski
boski

Reputation: 2467

I you only want what is in between from and where (from what I understood) try the following regex

gsub("(.*from )(.*)( where.*)","\\2",sql.data$Full.Sql)

Upvotes: 0

Related Questions