Reputation: 6300

Extract the string matched in a regex, not the line, with awk

This should not be too difficult but I could not find a solution.

I have a HTML file, and I want to extract all URLs with a specific pattern.

The pattern is /users/<USERNAME>/ - I actually only need the USERNAME.

I got only to this:

awk '/users\/.*\//{print $0}' file

But this filters me the complete line. I don't want the line.

Even just the whole URL is fine (e.g. get /users/USERNAME/), but I really only need the USERNAME....

Upvotes: 0

Answers (3)

Reputation: 784898

If you want to do this in single awk then use match function:

awk -v s="/users/" 'match($0, s "[^/[:blank:]]+") {
   print substr($0, RSTART+length(s), RLENGTH-length(s))
}' file

Or else this grep + cut will do the job:

grep -Eo '/users/[^/[:blank:]]+' file | cut -d/ -f

Upvotes: 2

Reputation: 67467

set the delimiter and do a literal match to second field and print the third.

$ awk -F/ '$2=="users"{print $3}'

Upvotes: 1

Reputation: 84

Assuming your statement gives you the entire line of something like /users/USERNAME/garbage/otherStuff/

You could pipe this result through head assuming you always know that it will be /users/USERNAME/....

After piping through head, you can also use cut commands to remove more of the end text until you have only the piece you want.

The command will look something like this
awk '/users\/.*\//{print $0}' file | head (options) | cut (options)

Upvotes: 0