pele88
pele88

Reputation: 872

Regex to select text after second occurance of special character

I have the following text:

/sources/x1/y1/CLD_strict_AccountsMatchfile_feed_2015062405.txt
/sources/x1/y1/CLD_strict_AdGroupMatchFile_feed_2015062405.txt
/sources/x1/y1/CLD_strict_AdsMatchfile_feed_2015062405.txt
/sources/x1/y1/CLD_strict_AdvertisersMatchfile_feed_2015062405.txt
/sources/x1/y1/CLD_strict_AudienceMatchFile_feed_2015062405.txt

And would like to match on everything after the 2nd "_" and before "Matchfile" using bash.

So for example:

etc.

How would I do this?

Upvotes: 1

Views: 245

Answers (5)

anubhava
anubhava

Reputation: 785038

This can be done easily using awk:

awk -F 'Match[Ff]ile|_' '{print $3}' file
Accounts
AdGroup
Ads
Advertisers
Audience

Upvotes: 1

zolo
zolo

Reputation: 469

With sed:

cat filename|sed 's/^[^_]\+_[^_]\+_\(.*\)Match[fF]ile.*/\1/g'

or with grep (on linux where grep supports -P and -o)

cat filename|grep -o -P "^[^_]+_[^_]+_\K(.*)(?=Match[fF]ile)"

Upvotes: 2

Nathan Wilson
Nathan Wilson

Reputation: 856

Since you asked how to do it using bash: ${line%%Match[Ff]ile*} will remove everything after and including Matchfile, and ${line##*_} will remove everything before the last underscore. So you can loop through the text like:

$ while read line; do line=${line%%Match[Ffile]*}; line=${line##*_}; echo $line; done < filename
Accounts
AdGroup
Ads
Advertisers
Audience

Upvotes: 4

Marcaitus
Marcaitus

Reputation: 111

cut -d'_' -f2- text-file | grep -oP "(?<=_).*(?=Match[Ff]ile)"

This uses lookbehind and lookahead regex to find but not match what's in the parenthesis.

Upvotes: 2

beresfordt
beresfordt

Reputation: 5222

In case you hadn't noticed some of these have Matchfile and some MatchFile, assuming wither is applicable as the end of the group you want to match the following perl one liner will work

perl -pe 's/^.*_.*_(.*)Match[Ff]ile.*$/$1/'

eg

find /sources/x1/y1/ -type f -name "*.txt" |  perl -pe 's/^.*_.*_(.*)Match[Ff]ile.*$/$1/'

will print

Accounts
AdGroup
Ads
Advertisers
Audience

(if the above are the only files in that dir which end in .txt)

Upvotes: 2

Related Questions