Bhavesh
Bhavesh

Reputation: 919

Extract the word from String using Regex

I have the following

Input:

 abc_account2621_activity_20161116_20161117_030627_311999667.csv
 xyx_account2622_click_2016111606_20161116_221031_311735299.csv
 sed_account2623_impression_2016111605_20161116_221808_311685411.csv
 abc_account2621_rich_media_2016111606_20161116_192542_311735300.csv
 vbc_account2622_match_table_activity_cats_20161116_20161117_0311_31.csv.gz  
 sbc_account2622_match_table_activity_types_20161116_20161117_0342_31.csv.gz

Expected Output

activity
click
impression
rich_media
match_table_activity_cats
match_table_activity_types

Code Tried up till now:

I want to access word which lies between [Number + (-)Underscore and End with (-)Underscore + Number]

 val x = "abc_account2621_match_table_activity_types_20161116_20161117_0342_31.csv.gz"

 val pattern3 = "(_([A-Za-z]+_[0-9]))".r
 var word=pattern3.findFirstIn(x).getOrElse("no match")
 word: String = _types_2

Upvotes: 2

Views: 6623

Answers (4)

JanLeeYu
JanLeeYu

Reputation: 1001

You may try this one : \d_([a-zA-Z_]+)_[0-9]+_ with the flag set to global. try it here

Upvotes: 0

jwvh
jwvh

Reputation: 51271

I often find pattern matching with regex patterns to be handy.

val pattern = """\d_(\D+)_\d""".r.unanchored
input.collect{case pattern(x) => x}
// res0: List(activity, click, impression, rich_media, match_table_activity_cats, match_table_activity_types)

Upvotes: 1

chengpohi
chengpohi

Reputation: 14217

val x = "abc_account2621_match_table_activity_types_20161116_20161117_0342_31.csv.gz"
val pattern3 = """_([a-zA-Z_]+)_\d+""".r
pattern3.findAllIn(x).matchData.map(_.group(1)).toList

_([a-zA-Z_]+)_\d+ with matchData to capture group can capture this.

See regex101 for this.

Upvotes: 2

Ori Marko
Ori Marko

Reputation: 58774

Use regex to find non number:

abc_account2621_([\D]+)_

For xyx_abc_... use:

([^_]+_[^_]+_)([\D]+)_

Upvotes: 2

Related Questions