Reputation: 872

Regex to select text after second occurance of special character

I have the following text:

/sources/x1/y1/CLD_strict_AccountsMatchfile_feed_2015062405.txt
/sources/x1/y1/CLD_strict_AdGroupMatchFile_feed_2015062405.txt
/sources/x1/y1/CLD_strict_AdsMatchfile_feed_2015062405.txt
/sources/x1/y1/CLD_strict_AdvertisersMatchfile_feed_2015062405.txt
/sources/x1/y1/CLD_strict_AudienceMatchFile_feed_2015062405.txt

And would like to match on everything after the 2nd "_" and before "Matchfile" using bash.

So for example:

on line 1 I want to select "Accounts"
on line 2 I want to select AdGroup

etc.

How would I do this?

Upvotes: 1

Answers (5)

anubhava

Reputation: 785038

This can be done easily using awk:

awk -F 'Match[Ff]ile|_' '{print $3}' file
Accounts
AdGroup
Ads
Advertisers
Audience

Upvotes: 1

zolo

Reputation: 469

With sed:

cat filename|sed 's/^[^_]\+_[^_]\+_\(.*\)Match[fF]ile.*/\1/g'

or with grep (on linux where grep supports -P and -o)

cat filename|grep -o -P "^[^_]+_[^_]+_\K(.*)(?=Match[fF]ile)"

Upvotes: 2

Nathan Wilson

Reputation: 856

Since you asked how to do it using bash: ${line%%Match[Ff]ile*} will remove everything after and including Matchfile, and ${line##*_} will remove everything before the last underscore. So you can loop through the text like:

$ while read line; do line=${line%%Match[Ffile]*}; line=${line##*_}; echo $line; done < filename
Accounts
AdGroup
Ads
Advertisers
Audience

Upvotes: 4

Marcaitus

Reputation: 111

cut -d'_' -f2- text-file | grep -oP "(?<=_).*(?=Match[Ff]ile)"

This uses lookbehind and lookahead regex to find but not match what's in the parenthesis.

Upvotes: 2

beresfordt

Reputation: 5222

In case you hadn't noticed some of these have Matchfile and some MatchFile, assuming wither is applicable as the end of the group you want to match the following perl one liner will work

perl -pe 's/^.*_.*_(.*)Match[Ff]ile.*$/$1/'

find /sources/x1/y1/ -type f -name "*.txt" |  perl -pe 's/^.*_.*_(.*)Match[Ff]ile.*$/$1/'

will print

Accounts
AdGroup
Ads
Advertisers
Audience

(if the above are the only files in that dir which end in .txt)

Upvotes: 2

Regex to select text after second occurance of special character

Answers (5)

Related Questions