Reputation: 459
So I am currently reading from this txt file:
Line 961: www-d1.proxy.aol.com - - [01/Aug/1995:00:35:32 -0400] "GET /elv/hot.gif HTTP/1.0" 200 1007
Line 965: www-d1.proxy.aol.com - - [01/Aug/1995:00:35:41 -0400] "GET /elv/PEGASUS/minpeg1.gif HTTP/1.0" 200 1055
Line 966: www-d1.proxy.aol.com - - [01/Aug/1995:00:35:46 -0400] "GET /elv/SCOUT/scout.gif HTTP/1.0" 200 1165
Line 969: www-d1.proxy.aol.com - - [01/Aug/1995:00:35:49 -0400] "GET /elv/DELTA/delta.gif HTTP/1.0" 200 2244
Line 972: www-d1.proxy.aol.com - - [01/Aug/1995:00:35:51 -0400] "GET /elv/ATLAS_CENTAUR/atlas.gif HTTP/1.0" 200 2286
Line 95219: u139.n72.queensu.ca - - [04/Aug/1995:10:40:04 -0400] "GET /elv HTTP/1.0" 302 -
And I am trying to print out only the names in the command line(basically only the ones in capital letters) WITH NO DUPLICATES. For example:
ATLAS_CENTAUR
DELTA
SCOUT
PEGASUS
My codes so far:
grep "/elv" ~/spacestation.txt | awk -F/ '{print $5}' | sort -u
Actual output:
1.0" 302
ATLAS_CENTAUR
DELTA
hot.gif HTTP
SCOUT
PEGASUS
Upvotes: 6
Views: 1089
Reputation: 1126
In awk
you can do the following.
awk -v FS="/elv/" '{print $2}' file
hot.gif HTTP/1.0" 200 1007
PEGASUS/minpeg1.gif HTTP/1.0" 200 1055
SCOUT/scout.gif HTTP/1.0" 200 1165
DELTA/delta.gif HTTP/1.0" 200 2244
ATLAS_CENTAUR/atlas.gif HTTP/1.0" 200 2286
NR>1
doesn't print the first recordawk -v FS="/elv/" 'NR>1{print $2}' file
PEGASUS/minpeg1.gif HTTP/1.0" 200 1055
SCOUT/scout.gif HTTP/1.0" 200 1165
DELTA/delta.gif HTTP/1.0" 200 2244
ATLAS_CENTAUR/atlas.gif HTTP/1.0" 200 2286
split()
function gets what you want:awk -v FS="/elv/" 'NR>1{split($2,a,/\//);print a[1]}' file
PEGASUS
SCOUT
DELTA
ATLAS_CENTAUR
Upvotes: 1
Reputation: 376
You need to put regex pattern in your awk script to compare $5:
Solution:
grep "/elv" ~/spacestation.txt | awk -F/ '$5 ~ /^[A-Z_]+/ {print $5}' | sort -u
Upvotes: 2
Reputation: 21
Maybe I'm wrong… but why not simply?
sed -ne "s_.*/elv/\([A-Z_]\+\).*_\1_p" <(cat spacestation.txt) | sort -u
Upvotes: 2
Reputation: 1996
Single command line:
grep -E '/elv/[A-Z]+' infile | sed -E 's:.*/elv/([A-Z_]+).*:\1:'
if sort
is needed/allowed,
grep -E '/elv/[A-Z]+' infile | sed -E 's:.*/elv/([A-Z_]+).*:\1:' | sort -u
Upvotes: 2
Reputation: 133610
Could you please try following, written and tested with shown samples only in GNU awk
.
your_command | awk '
match($0,/"GET \/[a-zA-Z]+\/[^/]*/){
val=substr($0,RSTART,RLENGTH)
sub(/.*\//,"",val)
if(val!~/[a-z]/){ print val }
val=""
}'
Upvotes: 2