Kavitha
Kavitha

Reputation: 151

Format output if column 2 field has more than one value

I have data which is colon delimeted as shown:

Joe:23;23;56:zz
Jim:44;44:cz
Rob:45;98:fc

In column 2 if there are more than one value then they need to print separately. Duplicates should be removed and only unique values should also print.

I tried this to remove duplicates:

sort -u -t : -k 2,2 file_name 
Output:
Joe:23;23;56:zz
Jim:44;44:cz
Rob:45;98:fc

Desired Output:

Jim:44:cz

Below ones need to print separately because column 2 has more than one value or we can append this output to other file.txt 
Joe:23;56:zz
Rob:45;98:fc

Upvotes: 3

Views: 163

Answers (4)

RavinderSingh13
RavinderSingh13

Reputation: 133680

Could you please try following. This will create 2 output files where one will have lines which have 2 values in 2nd column and other output file will have other than 2 values in 2nd column. Output file names will be out_file_two_cols and out_file_more_than_two_cols you could change it as per your need.

awk '
BEGIN{
  FS=OFS=":"
}
{
  delete a
  val=""
  num=split($2,array,";")
  for(j=1;j<=num;j++){
    if(!a[array[j]]++){
       val=(val?val ";":"")array[j]
    }
  }
  $2=val
  num=split($2,array,";")
}
num==1{
  print > ("out_file_two_cols")
  next
}
{
  print > ("out_file_more_than_two_cols")
}
' Input_file

Explanation: setting field separator and output field separator as : here for all lines of Input_file in BEGIN section. Then in main section deleting array named a and nullifying variable val, which will be explained further and being used by program in later section, deleting them to avoid conflict of getting their previous values here.

Splitting 2nd field into array by putting delimiter as ; and taking its total number of elements in num variable here. Now running for loop from 1 to till value of num here to traverse through all elements of 2nd field.

Checking condition if current value of 2nd field our of all elements not present in array a then add it on variable val and keep doing this for all elements of 2nd field.

Then assigning value of val to 2nd column. Now again checking now how many element present in new 2nd column by splitting it and num will tell us the same.

Then checking condition if num is 1 means current/edited 2nd field has only 1 element then print it one field output file else print it in other output file.

Upvotes: 4

markp-fuso
markp-fuso

Reputation: 35156

Source data:

$ cat colon.dat
Joe:23;23;56:zz
Jim:44;44:cz
Rob:45;98:fc

One awk solution:

awk -F":" '                                        # input field separator is colon
BEGIN   { OFS=FS }                                 # output field separate or colon

        { n=split($2,arr,";")                      # split field 2 by semi-colon
          m=n                                      # copy our array count
          delete seen                              # reset seen array
          for ( i=1 ; i<=n ; i++ ) {               # loop through array indices
            if ( arr[i] in seen ) {                # if entry has been seen then ...
                delete arr[i]                      #     remove from array and ....
                m--                                #     decrement our total count
            }
            else {            
                seen[arr[i]]                       # otherwise add element to the seen arrray
            }
          }

          outf="single.out"                        # output file for single entries

          if ( m >= 2 ) {
                outf="multiple.out"                # output file for multiple entries
          }

          printf "%s%s", $1, OFS > outf            # print header

          sep=""                                   # separator for first field is empty string

          for ( i in arr ) {                       # print remaining array elements
                printf "%s%s", sep, arr[i] > outf
                sep=";"                            # set separator to semi-colon for fields 2+
          }

          printf "%s%s\n", OFS, $3 > outf          # print trailer
        }
' colon.dat

NOTE: Remove comments to declutter code.

The above generates the following:

$ cat single.out
Jim:44:cz
$ cat multiple.out
Joe:23;56:zz
Rob:45;98:fc

Upvotes: 3

karakfa
karakfa

Reputation: 67537

another awk

$ awk 'function join() {
         s=sep="";
         for(k in b) {s=s sep k; sep=";"}
         return s}

         BEGIN {FS=OFS=":"}
               {n=split($2,a,";");
                delete b;
                for(i=1;i<=n;i++) b[a[i]]
                $2=join()
                if(length(b)==1) print;
                else {multi=multi ORS $0}}

          END  {print "\nmultiple values:" multi}' file

Jim:44:cz

multiple values:
Joe:23;56:zz
Rob:45;98:fc

Upvotes: 4

glenn jackman
glenn jackman

Reputation: 247042

Here's a little script

#!/usr/bin/env ruby
input = File.new "file_name"
single = File.new "one_value", "w"
multiple = File.new "two_value", "w"

input.each do |line|
    fields = line.split ":"
    value = fields[1].split(";").uniq.sort
    fields[1] = value.join ";"
    new_line = fields.join ":"
    if value.size == 1
        single << new_line
    else
        multiple << new_line
    end
end

Upvotes: 4

Related Questions