Reputation: 619

How to extract multiple words from a line using shell script

I need to extract the name of root CN of a website stored inside a file as shown below.

google.com   CN=Google Internet Authority G2
youtube.com   CN=Google Internet Authority G2

I want to extract this portion from the line "Google Internet Authority G2" and count the occurrence in the file.

I tried using this command, but I don't know the proper regex to be used for it. Can somebody help?

cat RootCertificates | tr -d '*CN='  | sort | uniq -c

Upvotes: 0

Answers (3)

Reputation: 44434

You can use sed instead.

sed 's/^.*CN=//' < RootCertificates | sort | ..

.. also, try to avoid cat if you can. In this can you can redirect the input of sed from your file.

Upvotes: 2

Reputation: 3022

Maybe grep

  grep -o 'CN=.*' file | sort | uniq -c
  2 CN=Google Internet Authority G2

or if you don't want the CNusing your input

cut -d "=" -f2 file | sort | uniq -c
2 Google Internet Authority G2

Upvotes: 0

Reputation: 42999

If you are guaranteed to have CN= on every line, a simple cut would suffice and there is no need for a regex:

cut -f2 -d= RootCertificates | sort | uniq -c

For your file, the output is:

  2 Google Internet Authority G2

Upvotes: 0