Reputation: 6300
This should not be too difficult but I could not find a solution.
I have a HTML file, and I want to extract all URLs with a specific pattern.
The pattern is /users/<USERNAME>/
- I actually only need the USERNAME.
I got only to this:
awk '/users\/.*\//{print $0}' file
But this filters me the complete line. I don't want the line.
Even just the whole URL is fine (e.g. get /users/USERNAME/
), but I really only need the USERNAME....
Upvotes: 0
Views: 46
Reputation: 784898
If you want to do this in single awk
then use match
function:
awk -v s="/users/" 'match($0, s "[^/[:blank:]]+") {
print substr($0, RSTART+length(s), RLENGTH-length(s))
}' file
Or else this grep + cut
will do the job:
grep -Eo '/users/[^/[:blank:]]+' file | cut -d/ -f
Upvotes: 2
Reputation: 67467
set the delimiter and do a literal match to second field and print the third.
$ awk -F/ '$2=="users"{print $3}'
Upvotes: 1
Reputation: 84
Assuming your statement gives you the entire line of something like
/users/USERNAME/garbage/otherStuff/
You could pipe this result through head assuming you always know that it will be
/users/USERNAME/....
After piping through head, you can also use cut
commands to remove more of the end text until you have only the piece you want.
The command will look something like this
awk '/users\/.*\//{print $0}' file | head (options) | cut (options)
Upvotes: 0