J Goldman
J Goldman

Reputation: 21

Using awk or sed to merge / print lines matching a pattern (oneliner?)

I have a file that contains the following text:

subject:asdfghj
subject:qwertym
subject:bigger1
subject:sage911
subject:mothers
object:cfvvmkme
object:rjo4j2f2
object:e4r234dd
object:uft5ed8f
object:rf33dfd1

I am hoping to achieve the following result using awk or sed (as a oneliner would be a bonus! [Perl oneliner would be acceptable as well]):

subject:asdfghj,object:cfvvmkme
subject:qwertym,object:rjo4j2f2
subject:bigger1,object:e4r234dd
subject:sage911,object:uft5ed8f
subject:mothers,object:rf33dfd1

I'd like to have each line that matches 'subject' and 'object' combined in the order that each one is listed, separated with a comma. May I see an example of this done with awk, sed, or perl? (Preferably as a oneliner if possible?)

I have tried some uses of awk to perform this, I am still learning I should add:

awk '{if ($0 ~ /subject/) pat1=$1; if ($0 ~ /object/) pat2=$2} {print $0,pat2}'

But does not do what I thought it would! So I know I have the syntax wrong. If I were to see an example that would greatly help so that I can learn.

Upvotes: 0

Views: 122

Answers (4)

Ed Morton
Ed Morton

Reputation: 203607

Since you specifically asked for a "oneliner" I assume brevity is far more important to you than clarity so:

$ awk -F: -v OFS=, 'NR>1&&$1!=p{f=1}{p=$1}f{print a[++c],$0;next}{a[NR]=$0}' file
subject:asdfghj,object:cfvvmkme
subject:qwertym,object:rjo4j2f2
subject:bigger1,object:e4r234dd
subject:sage911,object:uft5ed8f
subject:mothers,object:rf33dfd1

Upvotes: 1

Benjamin W.
Benjamin W.

Reputation: 52152

grep, paste and process substitution

$ paste -d , <(grep 'subject' infile) <(grep 'object' infile)
subject:asdfghj,object:cfvvmkme
subject:qwertym,object:rjo4j2f2
subject:bigger1,object:e4r234dd
subject:sage911,object:uft5ed8f
subject:mothers,object:rf33dfd1

This treats the output of grep 'subject' infile and grep 'object' infile like files due to process substitution (<( )), then pastes the results together with paste, using a comma as the delimiter (indicated by -d ,).

sed

The idea is to read and store all subject lines in the hold space, then for each object line fetch the hold space, get the proper subject and put the remaining subject lines back into hold space.

First the unreadable oneliner:

$ sed -rn '/^subject/H;/^object/{G;s/\n+/,/;s/^(.*),([^\n]*)(\n|$)/\2,\1\n/;P;s/^[^\n]*\n//;h}' infile
subject:asdfghj,object:cfvvmkme
subject:qwertym,object:rjo4j2f2
subject:bigger1,object:e4r234dd
subject:sage911,object:uft5ed8f
subject:mothers,object:rf33dfd1

-r is for extended regex (no escaping of parentheses, + and |) and -n does not print by default.

Expanded, more readable and explained:

/^subject/H         # Append subject lines to hold space
/^object/ {         # For each object line
    G               # Append hold space to pattern space
    s/\n+/,/        # Replace first group of newlines with a comma

    # Swap object (before comma) and subject (after comma)
    s/^(.*),([^\n]*)(\n|$)/\2,\1\n/

    P               # Print up to first newline
    s/^[^\n]*\n//   # Remove first line (can't use D because there is another command)
    h               # Copy pattern space to hold space
}

Remarks:

  • When the hold space is fetched for the first time, it starts with a newline (H adds one), so the newline-to-comma substitution replaces one or more newlines, hence the \n+: two newlines for the first time, one for the rest.
  • To anchor the end of the subject part in the swap, we use (\n|$): either a newline or the end of the pattern space – this is to get the swap also on the last line, where we don't have a newline at the end of the pattern space.
  • This works with GNU sed. For BSD sed as found in MacOS, there are some changes required:
    • The -r option has to be replaced by -E.
    • There has to be an extra semicolon before the closing brace: h;}
    • To insert a newline in the replacement string (swap command), we have to replace \n by either '$'\n'' or '"$(printf '\n')"'.

Upvotes: 1

Sobrique
Sobrique

Reputation: 53478

I'd do it something like this in perl:

#!/usr/bin/perl

use strict;
use warnings;

my @subjects;
while ( <DATA> ) { 
    m/^subject:(\w+)/ and push @subjects, $1; 
    m/^object:(\w+)/ and print "subject:",shift @subjects,",object:", $1,"\n";
}


__DATA__
subject:asdfghj
subject:qwertym
subject:bigger1
subject:sage911
subject:mothers
object:cfvvmkme
object:rjo4j2f2
object:e4r234dd
object:uft5ed8f
object:rf33dfd1

Reduced down to one liner, this would be:

perl -ne '/^(subject:\w+)/ and push @s, $1; /^object/ and print shift @s,$_' file

Upvotes: 1

karakfa
karakfa

Reputation: 67507

not perl or awk but easier.

$ pr -2ts, file
subject:asdfghj,object:cfvvmkme
subject:qwertym,object:rjo4j2f2
subject:bigger1,object:e4r234dd
subject:sage911,object:uft5ed8f
subject:mothers,object:rf33dfd1

Explanation

-2 2 columns

t ignore print header (filename, date, page number, etc)

s, use comma as the column separator

Upvotes: 4

Related Questions