Reputation: 471
I have many ascii
files in a directory, i just want to sort the file name numerically and want to paste side by side.
Secondly, after pasting i want to make all the column of same length by appending zero at the end.
My files are named as
data_Z_1 data_N_457 data_E_45
1.5 1.2 2.3
2.0 2.3 1.8
4.5
At first I just want sort the above file names numerically as given below and then want to paste side by side as
data_Z_1 data_E_45 data_N_457
1.5 2.3 1.2
2.0 1.8 2.3
4.5
Secondly i need to make all the columns equal length in a pasted file, so that output should be like
1.5 2.3 1.2
2.0 1.8 2.3
0.0 0.0 4.5
I tried as below:
ls data_*_* | sort -V
But it doesnot work.Can anybody help me overcoming this problem.Thanks in advance.
Upvotes: 0
Views: 85
Reputation: 22022
Would you please try the following:
paste $(ls data* | sort -t_ -k3n) | awk -F'\t' -v OFS='\t' '
{for (i=1; i<=NF; i++) if ($i == "") $i = "0.0"} 1'
Output:
1.5 2.3 1.2
2.0 1.8 2.3
0.0 0.0 4.5
sort -t_ -k3n
sets the field separator to _
and numerically sorts
the filenames on the 3rd field values.-F'\t' -v OFS='\t'
to the awk command assign
input/output field separator to a tab character.for (i=1; i<=NF; i++) if ($i == "") $i = "0.0"
scans the input fields and sets 0.0
for the empty fields.1
is equivalent to print $0
to print the fields.[Edit]
If you have huge number of files, it may exceed the capability of bash. Here is an alternative with python
using dataframe
.
#!/usr/bin/python
import glob
import pandas as pd
import re
files = glob.glob('data*')
files.sort(key=lambda x: int(re.sub(r'.*_', '', x))) # sort filenames numerically by its number
dfs = [] # list of dataframes
for f in files:
df = pd.read_csv(f, header=None, names=[f]) # read file and assign column
df = df.apply(pd.to_numeric, errors='coerce') # force the cell values to floats
dfs.append(df) # add as a new column
df = pd.concat(dfs, axis=1, join='outer') # create a dataframe from the list of dataframes
df = df.fillna(0) # fill empty cells
print(df.to_string(index=False, header=False)) # print the dataframe removing index and header
which will produce the same results.
Upvotes: 1