Reputation: 919
I have the following
Input:
abc_account2621_activity_20161116_20161117_030627_311999667.csv
xyx_account2622_click_2016111606_20161116_221031_311735299.csv
sed_account2623_impression_2016111605_20161116_221808_311685411.csv
abc_account2621_rich_media_2016111606_20161116_192542_311735300.csv
vbc_account2622_match_table_activity_cats_20161116_20161117_0311_31.csv.gz
sbc_account2622_match_table_activity_types_20161116_20161117_0342_31.csv.gz
Expected Output
activity
click
impression
rich_media
match_table_activity_cats
match_table_activity_types
Code Tried up till now:
I want to access word which lies between [Number + (-)Underscore and End with (-)Underscore + Number]
val x = "abc_account2621_match_table_activity_types_20161116_20161117_0342_31.csv.gz"
val pattern3 = "(_([A-Za-z]+_[0-9]))".r
var word=pattern3.findFirstIn(x).getOrElse("no match")
word: String = _types_2
Upvotes: 2
Views: 6623
Reputation: 1001
You may try this one :
\d_([a-zA-Z_]+)_[0-9]+_
with the flag set to global.
try it here
Upvotes: 0
Reputation: 51271
I often find pattern matching with regex patterns to be handy.
val pattern = """\d_(\D+)_\d""".r.unanchored
input.collect{case pattern(x) => x}
// res0: List(activity, click, impression, rich_media, match_table_activity_cats, match_table_activity_types)
Upvotes: 1
Reputation: 14217
val x = "abc_account2621_match_table_activity_types_20161116_20161117_0342_31.csv.gz"
val pattern3 = """_([a-zA-Z_]+)_\d+""".r
pattern3.findAllIn(x).matchData.map(_.group(1)).toList
_([a-zA-Z_]+)_\d+
with matchData
to capture group
can capture this.
See regex101 for this.
Upvotes: 2
Reputation: 58774
Use regex to find non number:
abc_account2621_([\D]+)_
For xyx_abc_... use:
([^_]+_[^_]+_)([\D]+)_
Upvotes: 2