Patrick
Patrick

Reputation: 303

Grep Perl Regex and Capture Groups

I'm trying to grab the SSl certificate information using this command:

openssl s_client -connect gcm-http.googleapis.com:443

which will return SSL certificate information. I'm trying to grep -P that but having trouble figuring out 1) the proper regular expression, and 2) how to actually make Grep return just that. So far, the grep command with the below regex's returns nothing.

Here is the information I'm operating against:

(More unrelated data - Truncated)
---
Certificate chain
 0 s:/C=US/ST=California/L=Mountain View/O=Google Inc/CN=*.googleapis.com
   i:/C=US/O=Google Inc/CN=Google Internet Authority G2
 1 s:/C=US/O=Google Inc/CN=Google Internet Authority G2
   i:/C=US/O=GeoTrust Inc./CN=GeoTrust Global CA
 2 s:/C=US/O=GeoTrust Inc./CN=GeoTrust Global CA
   i:/C=US/O=Equifax/OU=Equifax Secure Certificate Authority
---
Server certificate

-----BEGIN CERTIFICATE-----
MIIE3TCCA8WgAwIBAgIISZPzqn6Rx/0wDQYJKoZIhvcNAQELBQAwSTELMAkGA1UE
BhMCVVMxEzARBgNVBAoTCkdvb2dsZSBJbmMxJTAjBgNVBAMTHEdvb2dsZSBJbnRl
cm5ldCBBdXRob3JpdHkgRzIwHhcNMTcwNzI1MDgyOTQ0WhcNMTcxMDE3MDgyNzAw
WjBqMQswCQYDVQQGEwJVUzETMBEGA1UECAwKQ2FsaWZvcm5pYTEWMBQGA1UEBwwN
TW91bnRhaW4gVmlldzETMBEGA1UECgwKR29vZ2xlIEluYzEZMBcGA1UEAwwQKi5n
b29nbGVhcGlzLmNvbTCCASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoCggEBAL50
UZFxROM8NwIcSTC9V6XAJkoCcW+xuLtYFUbP+6zomzzxYXtTjh+s33XvtaHoNk1S
WxBqSX+0YsS1RHzwWT4KwJpkEyrqJ/WDtKs3hQY27Lng6IZmAYomoRXNZBWgXdQ0
sBddBU9/HtpKu0RpL6qM+7y7Xpp8KHilqPfjvtc8eljvOAdU3RA3w1p2JIov+F5n
sbD1bMqq3Xx6wbT7FLhzL8P/+g1NI0DC/fzSqW+pS/RLljQGLJrlvfmrV++i69Yg
pFRHPvTo85171cLjvHNv730SkM4W9SA7oHU+xzmANrT+p/ikcEJrcMnR9pKf08ON
pN9UgsEff7BZE0jvlu0CAwEAAaOCAaYwggGiMB0GA1UdJQQWMBQGCCsGAQUFBwMB
BggrBgEFBQcDAjB0BgNVHREEbTBrghAqLmdvb2dsZWFwaXMuY29tghUqLmNsaWVu
dHM2Lmdvb2dsZS5jb22CGCouY2xvdWRlbmRwb2ludHNhcGlzLmNvbYIWY2xvdWRl
bmRwb2ludHNhcGlzLmNvbYIOZ29vZ2xlYXBpcy5jb20waAYIKwYBBQUHAQEEXDBa
MCsGCCsGAQUFBzAChh9odHRwOi8vcGtpLmdvb2dsZS5jb20vR0lBRzIuY3J0MCsG
CCsGAQUFBzABhh9odHRwOi8vY2xpZW50czEuZ29vZ2xlLmNvbS9vY3NwMB0GA1Ud
DgQWBBRQbPBTOA3tVXQWc4iuJyyz5dGWMzAMBgNVHRMBAf8EAjAAMB8GA1UdIwQY
MBaAFErdBhYbvPZotXb1gba7Yhq6WoEvMCEGA1UdIAQaMBgwDAYKKwYBBAHWeQIF
ATAIBgZngQwBAgIwMAYDVR0fBCkwJzAloCOgIYYfaHR0cDovL3BraS5nb29nbGUu
Y29tL0dJQUcyLmNybDANBgkqhkiG9w0BAQsFAAOCAQEAeClOfrviHl9sZAVSTfYB
5FuIDKeSJHibXtjHSNsUP+JaAB9x1ABDczyLYWD/4PaD2w8jRXPXcVcqUaQPqyjF
1um/H/+Eb8+qfwl+Q3RiBAgGgAPw+s6GZK/kGfF9CNPbwhPXizYS6BZZ880/x3ec
Em0F+i0NbHsufPg4ghtJr2gFC2NWHwhvZtezbQDR2z8ePu1r3hyFwgotefCFsQJv
zAbVOvXsqHZdom3BLVwkANeh5hRfeW04N48bRVMZo9A0cULTg5LM1AOXGeLbp86z
D3RHbwtbRBGp2HUjfpt8FqeMzd+DxGlQXEc7l8aFwOgIFvWRJv+SHCXVT3rRHGD+
wA==
-----END CERTIFICATE-----

....
(More unrelated data - Truncated)

I've tried both of these regexs:

grep -P '((?:-+BEGIN CERTIFICATE-+\n)(.+\n)*(?:-+END CERTIFICATE-+))'

grep -P '(?:-+BEGIN CERTIFICATE-+\n)(.+\n)(?:-+END CERTIFICATE-+)'

Essentially, I want to only return the Certificate itself, not the ----BEGIN CERTIFICATE---- and -----END CERTIFICATE-----

I know there's probably a better way to do the regex, but I've tested it (and it works) on regexr.com and regex101.com

If Grep is capturing it successfully, doing an echo $1 is returning nothing.

Upvotes: 2

Views: 1840

Answers (3)

randomir
randomir

Reputation: 18697

Just for the record, here's a grep command that will extract only the certificate:

grep -zoPe '--BEGIN.*\n\K[^-]+' file | head -c-1

The trick is to use the -z/--null-data option (input lines are terminated with \0, not newline). Also we use PCRE and PCRE's special escape sequence, the reset match start \K, which causes any previously matched characters not to be included in the final matched sequence (we need only the part after --BEGIN...\n and before -...).

The head -c-1 will remove the very last character, which is a newline for older greps (e.g. GNU grep v2.12) and a null byte for newer greps (e.g. GNU grep v2.25).

Upvotes: 6

Walter A
Walter A

Reputation: 20032

Use sed:

sed -n '/----BEGIN CERTIFICATE-----/,/----END CERTIFICATE-----/ p' inputfile |
   sed '1d;$d'

EDIT: Missed "CERTIFICATE"

Or use awk:

awk '/----END CERTIFICATE-----/ {pr=0;}
     pr==1 {print}
     /----BEGIN CERTIFICATE-----/ {pr=1;}' inputfile

Upvotes: 1

PerlDuck
PerlDuck

Reputation: 5728

I didn't manage to get this working with grep, but have a Perl solution:

perl -0777 -n -e \
    'print $1 if /-+BEGIN CERTIFICATE-+\n(.+\n)*-+END CERTIFICATE-+/s' \
    cert.txt

This will print everything between the first "BEGIN..." and the last "END...".

Update:

@brian d foy wrote an article about the "exclusive flip-flop operator". According to that article this also works:

perl -n -e \
    'print if ($rc = /-+BEGIN CERTIFICATE/ .. /-+END CERTIFICATE-+/ and $rc !~ /(^1|E0)$/)' cert.txt 

Upvotes: 6

Related Questions