Reputation: 872
I have the following text:
/sources/x1/y1/CLD_strict_AccountsMatchfile_feed_2015062405.txt
/sources/x1/y1/CLD_strict_AdGroupMatchFile_feed_2015062405.txt
/sources/x1/y1/CLD_strict_AdsMatchfile_feed_2015062405.txt
/sources/x1/y1/CLD_strict_AdvertisersMatchfile_feed_2015062405.txt
/sources/x1/y1/CLD_strict_AudienceMatchFile_feed_2015062405.txt
And would like to match on everything after the 2nd "_" and before "Matchfile" using bash.
So for example:
etc.
How would I do this?
Upvotes: 1
Views: 245
Reputation: 785038
This can be done easily using awk
:
awk -F 'Match[Ff]ile|_' '{print $3}' file
Accounts
AdGroup
Ads
Advertisers
Audience
Upvotes: 1
Reputation: 469
With sed:
cat filename|sed 's/^[^_]\+_[^_]\+_\(.*\)Match[fF]ile.*/\1/g'
or with grep (on linux where grep supports -P and -o)
cat filename|grep -o -P "^[^_]+_[^_]+_\K(.*)(?=Match[fF]ile)"
Upvotes: 2
Reputation: 856
Since you asked how to do it using bash:
${line%%Match[Ff]ile*}
will remove everything after and including Matchfile, and ${line##*_}
will remove everything before the last underscore. So you can loop through the text like:
$ while read line; do line=${line%%Match[Ffile]*}; line=${line##*_}; echo $line; done < filename
Accounts
AdGroup
Ads
Advertisers
Audience
Upvotes: 4
Reputation: 111
cut -d'_' -f2- text-file | grep -oP "(?<=_).*(?=Match[Ff]ile)"
This uses lookbehind and lookahead regex to find but not match what's in the parenthesis.
Upvotes: 2
Reputation: 5222
In case you hadn't noticed some of these have Matchfile and some MatchFile, assuming wither is applicable as the end of the group you want to match the following perl one liner will work
perl -pe 's/^.*_.*_(.*)Match[Ff]ile.*$/$1/'
eg
find /sources/x1/y1/ -type f -name "*.txt" | perl -pe 's/^.*_.*_(.*)Match[Ff]ile.*$/$1/'
will print
Accounts
AdGroup
Ads
Advertisers
Audience
(if the above are the only files in that dir which end in .txt)
Upvotes: 2