Reputation:
I am working with the set of txt file containing multi column information present in one line. Within my bash script I use the following AWK expression to take the filename from each of the txt filles as well as the number from the 5th column and save it in 2 column format in results.CSV file (piped to SED, which remove path of the file and its extension from the final CSV file):
awk '-F, *' '{if(FNR==2) printf("%s| %s \n", FILENAME,$5) }' ${tmp}/*.txt | sed 's|\/Users/gleb/Desktop/scripts/clusterizator/tmp/||; s|\.txt||' >> ${home}/"${experiment}".csv
obtaining something (for 5 txt filles) like this as CSV:
lig177_cl_5.2| -0.1400
lig331_cl_3.5| -8.0000
lig394_cl_1.9| -4.3600
lig420_cl_3.8| -5.5200
lig550_cl_2.0| -4.3200
How it would be possible to modify my AWK expression in order to exclude "_cl_x.x" from the name of each txt file as well as add the name of the CSV as the comment to the first line of the resulted CSV file:
# results.CSV
lig177| -0.1400
lig331| -8.0000
lig394| -4.3600
lig420| -5.5200
lig550| -4.3200
Upvotes: 1
Views: 107
Reputation: 67567
based on the rest of the pipe, I think you want to do something like this and get rid of sed
invocations.
awk -F', *' 'FNR==2 {f=FILENAME;
sub(/.*\//,"",f);
sub(/_.*/ ,"",f);
printf("%s| %s\n", f, $5) }' "${tmp}"/*.txt >> "${home}/${experiment}.csv"
this will convert
/Users/gleb/Desktop/scripts/clusterizator/tmp/lig177_cl_5.2.txt
to
lig177
The pattern replacement is generic
/path/to/the/file/filename_otherstringshere...
will extract only filename
. From the last /
char to the first _
char. This is based the greedy matching of regex patterns.
For the output filename, it's easier to do it before awk call, since it's a one line only.
$ echo "${experiment}.csv" > "${home}/${experiment}.csv"
$ awk ... >> "${home}/${experiment}.csv"
Upvotes: 4