user3353722
user3353722

Reputation: 83

Turning multiple lines into one line based on common fields (group by)

I have fixed length file (sorted) and need to concatenate field from multiple rows based on key fields. The key fields are first two columns. The first 3 rows has white spaces at the beginning. The actual file does not have "spaces" word I have put that here to keep the formatting.

Input file

spaces2   001  111  111 Company Code         227                                
spaces2   001  111  111 Plan Code            012                                 
spaces2   001  111  111 Invoice Number       2014010                            
0000002     2  111  111 Company Code         214                                
0000002     2  111  111 Plan Code            20                                 
0000002     2  111  111 Invoice Number       3014010                            

Desired output:

spaces2   001  111  111 Company Code         227-12-2014010                                
0000002     2  111  111 Company Code         214-20-3014010                                

Could someone please suggest how to do this.

Upvotes: 2

Views: 398

Answers (1)

jaypal singh
jaypal singh

Reputation: 77185

Try something like this:

$ cat file
      2   001  111  111 Company Code         227
      2   001  111  111 Plan Code            012
      2   001  111  111 Invoice Number       2014010
0000002     2  111  111 Company Code         214
0000002     2  111  111 Plan Code            20
0000002     2  111  111 Invoice Number       3014010

$ awk '(($1,$2) in a){a[$1,$2]=a[$1,$2]"-"$NF;next}{gsub(/ *$/,"");a[$1,$2]=$0}END{for(x in a) print a[x]}' file
      2   001  111  111 Company Code         227-012-2014010
0000002     2  111  111 Company Code         214-20-3014010

Note that arrays in awk are associative so the order of output will be random. If order is important then it needs to be handled with additional code.

If order is critical for you then you can do something like:

awk '($1,$2) in seen {
    lines[i] = lines[i]"-"$NF; 
    next
}
{
    gsub(/ *$/,""); 
    seen[$1,$2]++; 
    lines[++i] = $0
}
END {
    for(line = 1; line <= length(lines); line++) {
        print lines[line]
    }
}' file

Explaination:

  • ($1,$2) in seen: This looks for first two fields in array seen. If it is present, it appends the last field to array line and next skips rest of the action.
  • { .. } - This code block gets executed for lines that are not in array seen. Typicall your first line will always enter here. So we use gsub to remove trailing space and populate the array seen and store that line in array line.
  • In the END block, we iterate over the array line and print it. Since we use incremental index, the order is preserved. Note: This expects your input data to be ordered

Upvotes: 2

Related Questions