Reputation: 151
I have data which is colon delimeted as shown:
Joe:23;23;56:zz
Jim:44;44:cz
Rob:45;98:fc
In column 2 if there are more than one value then they need to print separately. Duplicates should be removed and only unique values should also print.
I tried this to remove duplicates:
sort -u -t : -k 2,2 file_name
Output:
Joe:23;23;56:zz
Jim:44;44:cz
Rob:45;98:fc
Desired Output:
Jim:44:cz
Below ones need to print separately because column 2 has more than one value or we can append this output to other file.txt
Joe:23;56:zz
Rob:45;98:fc
Upvotes: 3
Views: 163
Reputation: 133680
Could you please try following. This will create 2 output files where one will have lines which have 2 values in 2nd column and other output file will have other than 2 values in 2nd column. Output file names will be out_file_two_cols
and out_file_more_than_two_cols
you could change it as per your need.
awk '
BEGIN{
FS=OFS=":"
}
{
delete a
val=""
num=split($2,array,";")
for(j=1;j<=num;j++){
if(!a[array[j]]++){
val=(val?val ";":"")array[j]
}
}
$2=val
num=split($2,array,";")
}
num==1{
print > ("out_file_two_cols")
next
}
{
print > ("out_file_more_than_two_cols")
}
' Input_file
Explanation: setting field separator and output field separator as :
here for all lines of Input_file in BEGIN section. Then in main section deleting array named a and nullifying variable val, which will be explained further and being used by program in later section, deleting them to avoid conflict of getting their previous values here.
Splitting 2nd field into array by putting delimiter as ;
and taking its total number of elements in num variable here. Now running for loop from 1 to till value of num here to traverse through all elements of 2nd field.
Checking condition if current value of 2nd field our of all elements not present in array a then add it on variable val and keep doing this for all elements of 2nd field.
Then assigning value of val to 2nd column. Now again checking now how many element present in new 2nd column by splitting it and num will tell us the same.
Then checking condition if num is 1 means current/edited 2nd field has only 1 element then print it one field output file else print it in other output file.
Upvotes: 4
Reputation: 35156
Source data:
$ cat colon.dat
Joe:23;23;56:zz
Jim:44;44:cz
Rob:45;98:fc
One awk
solution:
awk -F":" ' # input field separator is colon
BEGIN { OFS=FS } # output field separate or colon
{ n=split($2,arr,";") # split field 2 by semi-colon
m=n # copy our array count
delete seen # reset seen array
for ( i=1 ; i<=n ; i++ ) { # loop through array indices
if ( arr[i] in seen ) { # if entry has been seen then ...
delete arr[i] # remove from array and ....
m-- # decrement our total count
}
else {
seen[arr[i]] # otherwise add element to the seen arrray
}
}
outf="single.out" # output file for single entries
if ( m >= 2 ) {
outf="multiple.out" # output file for multiple entries
}
printf "%s%s", $1, OFS > outf # print header
sep="" # separator for first field is empty string
for ( i in arr ) { # print remaining array elements
printf "%s%s", sep, arr[i] > outf
sep=";" # set separator to semi-colon for fields 2+
}
printf "%s%s\n", OFS, $3 > outf # print trailer
}
' colon.dat
NOTE: Remove comments to declutter code.
The above generates the following:
$ cat single.out
Jim:44:cz
$ cat multiple.out
Joe:23;56:zz
Rob:45;98:fc
Upvotes: 3
Reputation: 67537
another awk
$ awk 'function join() {
s=sep="";
for(k in b) {s=s sep k; sep=";"}
return s}
BEGIN {FS=OFS=":"}
{n=split($2,a,";");
delete b;
for(i=1;i<=n;i++) b[a[i]]
$2=join()
if(length(b)==1) print;
else {multi=multi ORS $0}}
END {print "\nmultiple values:" multi}' file
Jim:44:cz
multiple values:
Joe:23;56:zz
Rob:45;98:fc
Upvotes: 4
Reputation: 247042
Here's a little ruby script
#!/usr/bin/env ruby
input = File.new "file_name"
single = File.new "one_value", "w"
multiple = File.new "two_value", "w"
input.each do |line|
fields = line.split ":"
value = fields[1].split(";").uniq.sort
fields[1] = value.join ";"
new_line = fields.join ":"
if value.size == 1
single << new_line
else
multiple << new_line
end
end
Upvotes: 4