Kinja
Kinja

Reputation: 459

Print only words with Capital Letters (Linux)

So I am currently reading from this txt file:

Line 961: www-d1.proxy.aol.com - - [01/Aug/1995:00:35:32 -0400] "GET /elv/hot.gif HTTP/1.0" 200 1007
Line 965: www-d1.proxy.aol.com - - [01/Aug/1995:00:35:41 -0400] "GET /elv/PEGASUS/minpeg1.gif HTTP/1.0" 200 1055
Line 966: www-d1.proxy.aol.com - - [01/Aug/1995:00:35:46 -0400] "GET /elv/SCOUT/scout.gif HTTP/1.0" 200 1165
Line 969: www-d1.proxy.aol.com - - [01/Aug/1995:00:35:49 -0400] "GET /elv/DELTA/delta.gif HTTP/1.0" 200 2244
Line 972: www-d1.proxy.aol.com - - [01/Aug/1995:00:35:51 -0400] "GET /elv/ATLAS_CENTAUR/atlas.gif HTTP/1.0" 200 2286
Line 95219: u139.n72.queensu.ca - - [04/Aug/1995:10:40:04 -0400] "GET /elv HTTP/1.0" 302 -

And I am trying to print out only the names in the command line(basically only the ones in capital letters) WITH NO DUPLICATES. For example:

ATLAS_CENTAUR
DELTA
SCOUT
PEGASUS

My codes so far:

grep "/elv" ~/spacestation.txt | awk -F/ '{print $5}' | sort -u

Actual output:

1.0" 302
ATLAS_CENTAUR
DELTA
hot.gif HTTP
SCOUT
PEGASUS

Upvotes: 6

Views: 1089

Answers (5)

Carlos Pascual
Carlos Pascual

Reputation: 1126

In awk you can do the following.

  • set FS separator and print:

awk -v FS="/elv/" '{print $2}' file

hot.gif HTTP/1.0" 200 1007
PEGASUS/minpeg1.gif HTTP/1.0" 200 1055
SCOUT/scout.gif HTTP/1.0" 200 1165
DELTA/delta.gif HTTP/1.0" 200 2244
ATLAS_CENTAUR/atlas.gif HTTP/1.0" 200 2286
  • you see that NR>1 doesn't print the first record

awk -v FS="/elv/" 'NR>1{print $2}' file

PEGASUS/minpeg1.gif HTTP/1.0" 200 1055
SCOUT/scout.gif HTTP/1.0" 200 1165
DELTA/delta.gif HTTP/1.0" 200 2244
ATLAS_CENTAUR/atlas.gif HTTP/1.0" 200 2286
  • and the split() function gets what you want:

awk -v FS="/elv/" 'NR>1{split($2,a,/\//);print a[1]}' file

PEGASUS
SCOUT
DELTA
ATLAS_CENTAUR

Upvotes: 1

acakojic
acakojic

Reputation: 376

You need to put regex pattern in your awk script to compare $5:

Solution:

grep "/elv" ~/spacestation.txt | awk -F/ '$5 ~ /^[A-Z_]+/ {print $5}' | sort -u 
  1. '~' is for compare $5 with regex pattern matching
  2. '^' is first character of word
  3. '[A-Z_]' will look for all caps-lock characters including with _
  4. '+' is for to continue with matching [A-Z_] if he finds one or more character like this

Upvotes: 2

Anthony Gelibert
Anthony Gelibert

Reputation: 21

Maybe I'm wrong… but why not simply?

sed -ne "s_.*/elv/\([A-Z_]\+\).*_\1_p" <(cat spacestation.txt) | sort -u

Upvotes: 2

Milag
Milag

Reputation: 1996

Single command line:

grep -E '/elv/[A-Z]+' infile | sed -E 's:.*/elv/([A-Z_]+).*:\1:'

if sort is needed/allowed,

 grep -E '/elv/[A-Z]+' infile | sed -E 's:.*/elv/([A-Z_]+).*:\1:' | sort -u

Upvotes: 2

RavinderSingh13
RavinderSingh13

Reputation: 133610

Could you please try following, written and tested with shown samples only in GNU awk.

your_command | awk '
match($0,/"GET \/[a-zA-Z]+\/[^/]*/){
  val=substr($0,RSTART,RLENGTH)
  sub(/.*\//,"",val)
  if(val!~/[a-z]/){  print val  }
  val=""
}'

Upvotes: 2

Related Questions