Alex
Alex

Reputation: 179

Extract character between the first two characters

I have a table in BigQuery:

ab_col_jfsfhfd_ggg_sdf
arfd_am_fdsf_fddg_fg
d_fdf_fdddg_ffddd_f

I would like to extract those characters that go right after the first _ character and followed by the second _ character. I want to get the following:

col
am
fdf

I used the following regular expression to extract the characters but it does not work as intended:

^.*\_(\D+)\_.*$
regexp_replace(id,'^.*\\_(\\D+)\\_.*$' , '\\1')

Please help!

Upvotes: 0

Views: 958

Answers (3)

Ryszard Czech
Ryszard Czech

Reputation: 18641

Use regexp_extract:

regexp_extract(id,'^[^_]+_([^_]+)')

See proof

Explanation

--------------------------------------------------------------------------------
  ^                        the beginning of the string
--------------------------------------------------------------------------------
  [^_]+                    any character except: '_' (1 or more times
                           (matching the most amount possible))
--------------------------------------------------------------------------------
  _                        '_'
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    [^_]+                    any character except: '_' (1 or more
                             times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of \1

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1271231

split() is a very simply way of solving this. But regular expressions are also quite simple:

with t as (
      select 'ab_col_jfsfhfd_ggg_sdf' as id union all
      select 'arfd_am_fdsf_fddg_fg' union all
      select 'd_fdf_fdddg_ffddd_f'
     )
select id, regexp_extract(id, '[^_]+', 1, 2)
from t;

The logic for the pattern is: "Look for any string of characters that is not an underscore. Then take the second one in the string."

Upvotes: 2

GMB
GMB

Reputation: 222722

If I follow you correctly, you can use split():

(split(col, '_'))[safe_ordinal(2)]

split() turns the string column to an array of values, given a separator (here, we use _). Then we can just grab second array element.

Upvotes: 2

Related Questions