AndyPerlitch
AndyPerlitch

Reputation: 4729

Sed: Extracting regex pattern from lines

I have an input stream of many lines which look like this:

path/to/file:             example: 'extract_me.proto'
path/to/other-file:             example: 'me_too.proto'
path/to/something/else:             example: 'and_me_2.proto'
...

I'd like to just extract the *.proto filenames from these lines, and I have tried:

[INPUT] | sed 's/^.*\([a-zA-Z0-9_]+\.proto\).*$/\1/'

I know that part of my problem is that .* is greedy and I'm going to get things like e.proto and o.proto and 2.proto, but I can't even get that far... it just outputs with the same lines as the input. Any help would be greatly appreciated.

Upvotes: 0

Views: 83

Answers (5)

Cyrus
Cyrus

Reputation: 88601

With GNU sed:

sed -E "s/.*'([^']+)'$/\1/"

Upvotes: 1

glenn jackman
glenn jackman

Reputation: 246807

Since you tag your command with , I'll assume you have GNU grep. Pick one of

grep -oP '\w+\.proto' file
grep -o "[^']+\\.proto" file

Upvotes: 2

sat
sat

Reputation: 14949

Use this sed:

sed "s/^.*'\([a-zA-Z0-9_]\+\.proto\).*$/\1/"

+ - Extended-RegEx. So, you need to escape to get special meaning. The preceding item will be matched one or more times.

Another way:

sed "s/^.*'\([^']\+\.proto\)'.*$/\1/"

Upvotes: 1

Emily Shepherd
Emily Shepherd

Reputation: 1369

I find it helpful to use extended regex for this purpose (-r) in which case you need not escape your brackets.

sed -r 's/^.*[^a-zA-Z0-9_]([a-zA-Z0-9_]+\.proto).*$/\1/'

The addition of [^a-zA-Z0-9_] forces the .* to not be greedy.

Upvotes: 2

Jean-François Fabre
Jean-François Fabre

Reputation: 140168

one way to do it:

sed 's/^.*[^a-zA-Z0-9_]\([a-zA-Z0-9_]\+\.proto\).*$/\1/'
  • escaped the + char
  • put a negation before the alphanum+underscore to delimit the leading chars

another way: use single quote delimitation, after all it's here for that:

sed "s/^.*'\([a-zA-Z0-9_]\+\.proto\)'.*\$/\1/" 

Upvotes: 1

Related Questions