Simplify text processing pipeline with awk

Question

I have the following text data (highly simplified):

dn: cn=config
objectClass: olcGlobal
cn: config
some: properties

dn: cn={0}kerberos,cn=schema,cn=config
objectClass: olcSchemaConfig
cn: {0}kerberos
some: properties
some: junk
some: more junk

dn: olcDatabase={-1}frontend,cn=config
objectClass: olcDatabaseConfig
some: properties

The desired output is:

dn: cn=kerberos,cn=schema,cn=config
objectClass: olcSchemaConfig
cn: kerberos
some: properties

I have written the following shell pipeline to achieve this:

awk -vRS= -vFS="
" '/kerberos/{print $0}' /tmp/input.txt | \
    sed 's/{0}kerberos/kerberos/' | \
    sed '/some: junk/,$d'

This works just fine, but I feel like it's 'cheating' mixing awk and sed. How can I implement this using a single awk script?

Jonathan Leffler · Accepted Answer

Clearly, you only need one sed command, not two:

sed -e 's/{0}kerberos/kerberos/' -e '/some: junk/,$d'

Unless you insist on using a C shell, the backslashes at the ends of the lines are unnecessary.

You could do it all in a single sed command:

sed -n -e '/kerberos/,/^$/{
        s/{0}kerberos/kerberos/
        /some: junk/,$d; p;}'

which could be flattened onto a single line with a semicolon after the s/// substitution.

sed -n -e '/kerberos/,/^$/{ s/{0}kerberos/kerberos/; /some: junk/,$d; p; }'

The semicolon before the } is needed with sed on Mac OS X (BSD); GNU sed is happy without it.

You can do it all in awk too:

awk '/kerberos/,/^$/ { sub(/\{0\}kerberos/,"kerberos");
                       if ($0 ~ /^some:/ && some++ > 0) next;
                       if ($0 != "") print
                     }' input.txt

which, for the input data, produces:

dn: cn=kerberos,cn=schema,cn=config
objectClass: olcSchemaConfig
cn: kerberos
some: properties

Simplify text processing pipeline with awk

Answers (1)

Related Questions