wilssssssslam
wilssssssslam

Reputation: 85

How to merge duplicate lines into same row with primary key and more than one column of information

Here is my data:

NAME1,NAME1_001,NULL,LIC100_1,NULL,LIC300-3,LIC300-6
NAME1,NAME1_003,LIC000_1,NULL,NULL,NULL,NULL
NAME2,NAME2_001,LIC000_1,NULL,LIC400_2,NULL,NULL
NAME3,NAME3_001,NULL,LIC400_2,NULL,NULL,LIC500_1
NAME3,NAME3_005,LIC000_1,NULL,LIC400_2,NULL,NULL
NAME3,NAME3_006,LIC000_1,NULL,LIC400_2,NULL,NULL
NAME4,NAME4_002,NULL,LIC100_1,NULL,LIC300-3,LIC300-6

Expected result:

NAME1|NAME1_001|NULL|LIC100_1|NULL|LIC300-3|LIC300-6|NAME1_003|LIC000_1|NULL|NULL|NULL|NULL
NAME2|NAME2_001|LIC000_1|NULL|LIC400_2|NULL|NULL
NAME3|NAME3_001|NULL|LIC400_2|NULL|NULL|LIC500_1|NAME3_005|LIC000_1|NULL|LIC400_2|NULL|NULL|NAME3_006|LIC000_1|NULL|LIC400_2|NULL|NULL
NAME4|NAME4_002|NULL|LIC100_1|NULL|LIC300-3|LIC300-6

I tried below command, but have no idea how to add the details ($3 to $7)

awk '
    BEGIN{FS=","; OFS="|"}; 
    { arr[$1] = arr[$1] == ""? $2 : arr[$1] "|" $2 }   
    END {for (i in arr) print i, arr[i] }' file.csv

Any suggestion? thanks!!

Upvotes: 6

Views: 230

Answers (3)

Ed Morton
Ed Morton

Reputation: 203522

Assuming your input is grouped by the key field as shown in your example (if it isn't then sort it first) you don't need to store the whole file in memory or read it twice and this will output the lines in the same order they appear in the input:

$ cat tst.awk
BEGIN { FS=","; OFS="|" }
$1 != prev {
    if (NR>1) {
        print rec
    }
    prev = rec = $1
}
{
    $1 = ""
    rec = rec $0
}
END { print rec }

$ awk -f tst.awk file
NAME1|NAME1_001|NULL|LIC100_1|NULL|LIC300-3|LIC300-6|NAME1_003|LIC000_1|NULL|NULL|NULL|NULL
NAME2|NAME2_001|LIC000_1|NULL|LIC400_2|NULL|NULL
NAME3|NAME3_001|NULL|LIC400_2|NULL|NULL|LIC500_1|NAME3_005|LIC000_1|NULL|LIC400_2|NULL|NULL|NAME3_006|LIC000_1|NULL|LIC400_2|NULL|NULL
NAME4|NAME4_002|NULL|LIC100_1|NULL|LIC300-3|LIC300-6

Upvotes: 1

RavinderSingh13
RavinderSingh13

Reputation: 133518

Could you please try following. Written and tested with shown samples in GNU awk.

awk '
BEGIN{
  FS=","
  OFS="|"
}
FNR==NR{
  first=$1
  $1=""
  sub(/^,/,"")
  arr[first]=(first in arr?arr[first] OFS:"")$0
  next
}
($1 in arr){
  print $1 arr[$1]
  delete arr[$1]
}
' Input_file  Input_file

Explanation: Adding detailed explanation for above.

awk '                       ##Starting awk program from here.
BEGIN{                      ##Starting BEGIN section of this program from here.
  FS=","                    ##Setting FS as comma here.
  OFS="|"                   ##Setting OFS as | here.
}
FNR==NR{                    ##Checking FNR==NR which will be TRUE when first time Input_file is being read.
  first=$1                  ##Setting first as 1st field here.
  $1=""                     ##Nullifying first field here.
  sub(/^,/,"")              ##Substituting starting comma with NULL in current line.
  arr[first]=(first in arr?arr[first] OFS:"")$0  ##Creating arr with index of first and keep adding same index value to it.
  next                      ##next will skip all further statements from here.
}
($1 in arr){                ##Checking condition if 1st field is present in arr then do following.
  print $1 arr[$1]          ##Printing 1st field with arr value here.
  delete arr[$1]            ##Deleting arr item here.
}
' Input_file  Input_file    ##Mentioning Input_file names here.

Upvotes: 5

James Brown
James Brown

Reputation: 37404

Another awk:

$ awk '
BEGIN {               # set them field separators
    FS=","
    OFS="|"
}
{
    if($1 in a) {     # if $1 already has an entry in a hash
        t=$1          # store key temporarily
        $1=a[$1]      # set the a hash entry to $1
        a[t]=$0       # and hash the record
    } else {          # if $1 seen for the first time
        $1=$1         # rebuild record to change the separators
        a[$1]=$0      # and hash the record
    }
}
END {                 # afterwards
    for(i in a)       # iterate a 
        print a[i]    # and output
}' file

Upvotes: 5

Related Questions