pasting file side by side

Question

I have many ascii files in a directory, i just want to sort the file name numerically and want to paste side by side. Secondly, after pasting i want to make all the column of same length by appending zero at the end.

My files are named as

 data_Z_1   data_N_457 data_E_45
  1.5          1.2       2.3
  2.0          2.3       1.8
               4.5

At first I just want sort the above file names numerically as given below and then want to paste side by side as

data_Z_1  data_E_45 data_N_457
  1.5       2.3      1.2
  2.0       1.8      2.3
                     4.5

Secondly i need to make all the columns equal length in a pasted file, so that output should be like

   1.5    2.3    1.2
   2.0    1.8    2.3
   0.0    0.0    4.5

I tried as below:
ls data_*_* | sort -V

But it doesnot work.Can anybody help me overcoming this problem.Thanks in advance.

tshiono · Accepted Answer

Would you please try the following:

paste $(ls data* | sort -t_ -k3n) | awk -F'	' -v OFS='	' '
{for (i=1; i<=NF; i++) if ($i == "") $i = "0.0"} 1'

Output:

1.5     2.3     1.2
2.0     1.8     2.3
0.0     0.0     4.5

sort -t_ -k3n sets the field separator to _ and numerically sorts the filenames on the 3rd field values.
The options -F' ' -v OFS=' ' to the awk command assign input/output field separator to a tab character.
The awk statement for (i=1; i<=NF; i++) if ($i == "") $i = "0.0" scans the input fields and sets 0.0 for the empty fields.
The final 1 is equivalent to print $0 to print the fields.

[Edit]
If you have huge number of files, it may exceed the capability of bash. Here is an alternative with python using dataframe.

#!/usr/bin/python

import glob
import pandas as pd
import re

files = glob.glob('data*')
files.sort(key=lambda x: int(re.sub(r'.*_', '', x)))    # sort filenames numerically by its number

dfs = []                                                # list of dataframes
for f in files:
    df = pd.read_csv(f, header=None, names=[f])         # read file and assign column
    df = df.apply(pd.to_numeric, errors='coerce')       # force the cell values to floats
    dfs.append(df)                                      # add as a new column
df = pd.concat(dfs, axis=1, join='outer')               # create a dataframe from the list of dataframes
df = df.fillna(0)                                       # fill empty cells
print(df.to_string(index=False, header=False))          # print the dataframe removing index and header

which will produce the same results.

pasting file side by side

Answers (1)

Related Questions