bison72
bison72

Reputation: 324

Paste files with unequal lines and multiple columns in Unix

Let say I have three files with the content as below:

file1:

AAA     BBB     CCC
123     563     2467

file2:

NA      NA      NA

file3:

DDD     EEE     FFF
2333    236     908
123     376     887

I want to paste these files and insert 0 for empty fields like below:

AAA     BBB     CCC     NA      NA      NA      DDD     EEE     FFF
123     563     2467    0       0       0       2333    236     908
0       0       0       0       0       0       123     376     887

I couldn't even get the first step passed to paste. when I tried paste file1 file2 file3 it becomes like:

AAA BBB CCC NA  NA  NA  DDD EEE FFF
123 563 2467        2333    236 908
        123 376 887

Although I found that pr -m -t file1 file2 file3 works but it does not display full strings as I have some long strings.

How do I get it worked in this case? Thanks in advance.

Upvotes: 0

Views: 346

Answers (3)

RavinderSingh13
RavinderSingh13

Reputation: 133458

Could you please try following. This looks more Robust, following are the advantages of this one.

  • It should work without GNU awk too(though I couldn't test on all versions of it).
  • It will look for maximum number of lines in all passed Input_files and will print zeroes as per max number of fields in that null line.
  • It will also look if a line(2nd line in 1st Input_file and 2nd line in 2nd Input_file) and if they have lesser number of fields it will add remaining number fields too(append simply) to line.

awk '
FNR==1{
  total_fields=total_fields>NF?total_fields:NF
  ++file
}
{
  file_lines[file]++
  a[FNR,file]=$0
}
END{
  for(i in file_lines){
     val=val>file_lines[i]?val:file_lines[i]
  }
  for(i=1;i<=val;i++){
     for(j=1;j<=file;j++){
       num=split(a[i,j],array," ")
       if(num<total_fields){
          new_val=sprintf("%0"total_fields-num"d","")
          ##printf "....... "total_fields-num">>>>"new_val" "
          gsub(/0/,"0 ",new_val)
          a[i,j]=(a[i,j]?a[i,j] OFS:"")new_val
          new_val=""
       }
       if(!a[i,j]){
          a[i,j]=sprintf("%0"total_fields"d",a[i,j])
          gsub(/0/,"0 ",a[i,j])
       }
       printf("%s%s",a[i,j],j==file?ORS:OFS)
       new_val=""
     }
  }
}
'   file1  file2  file3  |  column -t

Upvotes: 1

karakfa
karakfa

Reputation: 67467

here is another approach

$ paste -d~ file1 file2 file3 | 
  awk -F~ '{$1=$1; for(i=1;i<=NF;i++) if($i=="") $i=nvl[i]}1
     NR==1 {for(i=1;i<=NF;i++) {gsub(/[^ ]+/,0,$i); nvl[i]=$i}}' | column -t


AAA  BBB  CCC   NA  NA  NA  DDD   EEE  FFF
123  563  2467  0   0   0   2333  236  908
0    0    0     0   0   0   123   376  887

for the pr command you can disable truncation with -J option which will work for long strings, but no missing value substitution.

$ pr -Jmt file{1..3}

Upvotes: 0

James Brown
James Brown

Reputation: 37394

You could use GNU awk for it (modified from this answer. Notice the ordering part):

$ gawk '
BEGIN {
    FS=OFS="\t"                                                   # set delimiters
}
{
    for(i=1;i<=NF;i++)                                            # iterate data fields
        a[FNR][s+i]=$i                                            # hash them
}
ENDFILE {
    s+=NF                                                         # store field count
}
END {
    for(i in a) {                                                 # in awk order
        for(j=1;j<=s;j++)                                         # and data
            printf "%s%s",(a[i][j]==""?0:a[i][j]),(j==s?ORS:OFS)  # output
    }
}' file1 file2 file3

Output:

AAA     BBB     CCC     NA      NA      NA      DDD     EEE     FFF
123     563     2467    0       0       0       2333    236     908
0       0       0       0       0       0       123     376     887

Upvotes: 1

Related Questions