Reputation: 332
I have the two following text files:
file1
-7.7
-7.4
-7.3
-7.3
-7.3
file2
4.823
5.472
5.856
4.770
4.425
And I want to merge them side by side, separated by a comma:
file3
-7.7,4.823
-7.4,5.472
-7.3,5.856
-7.3,4.770
-7.3,4.425
I know this can be easily done with paste -d ',' file1 file2 > file3
, but I want a solution that allows me to have control over each iteration, since my dataset is big and I also need to add other columns to the output file. E.g.:
A,-7.7,4.823,3
A,-7.4,5.472,2
B,-7.3,5.856,3
A,-7.3,4.770,1
B,-7.3,4.425,1
Here's what I got so far:
awk 'NR==FNR {a[$count]=$1; count+=1; next} {print a[$count] "," $1; count+=1;}' file1 file2 > file3
Output:
-7.3,4.823
-7.3,5.472
-7.3,5.856
-7.3,4.770
-7.3,4.425
I am new to bash and awk, so a detailed response would be appreciated :)
Edit:
Suppose I have a directory with pairs of files, ending with two extensions: .ext1 and .ext2. Those files have parameters included in their names, for example file_0_par1_par2.ext1 has its pair, file_0_par1_par2.ext2. Each file contains 5 values. I have a function to extract its serial number and its parameters from its name. My goal is to write, on a single csv file (file_out.csv), the values present in the files along with the parameters extracted from their names.
Code:
for file1 in *.ext1 ; do
for file2 in *.ext2 ; do
# for each file ending with .ext2, verify if it is file1's corresponding pair
# I know this is extremely time inefficient, since it's a O(n^2) operation, but I couldn't find another alternative
if [[ "${file1%.*}" == "${file2%.*}" ]] ; then
# extract file_number, and par1, par2 based on some conditions, then append to the csv file
paste -d ',' "$file1" "$file2" | while IFS="," read -r var1 var2;
do
echo "$par1,$par2,$var1,$var2,$file_number" >> "file_out.csv"
done
fi
done
done
Upvotes: 2
Views: 1723
Reputation: 256
Two pieces of advice that might help:
First, I suspect an Awk script that does what you want will be too long for a one-liner. I would write a multi-line script that takes file1
and file2
as arguments, and store it in a file called mymerge.awk
or whatever. Here is a skeleton:
#!/usr/bin/awk -f
BEGIN {
file1=ARGV[1]; file2=ARGV[2]
}
# The guts of your script go here.
Then you can simply make your script executable (chmod +x mymerge.awk
) and call it from the shell: mymerge.awk file1 file2
. This approach has the advantage of making your script easy to read, reuse, and maintain.
The second piece of advice: Use Awk's getline < file1
to read data from file1
instead of stdin
. Likewise for file2
. To store the lines you just read in variables, you can say
getline var1 < file1; getline var2 < file2
The Gnu Awk Users Guide has a comprehensive and readable description of getline
and how to use it.
I won't be able to write and test a working script for you tonight, but I hope this will help you make some progress.
Upvotes: 0
Reputation: 204558
The way to efficiently do what your updated question describes:
Suppose I have a directory with pairs of files, ending with two extensions: .ext1 and .ext2. Those files have parameters included in their names, for example file_0_par1_par2.ext1 has its pair, file_0_par1_par2.ext2. Each file contains 5 values. I have a function to extract its serial number and its parameters from its name. My goal is to write, on a single csv file (file_out.csv), the values present in the files along with the parameters extracted from their names.
for file1 in *.ext1 ; do
for file2 in *.ext2 ; do
# for each file ending with .ext2, verify if it is file1's corresponding pair
# I know this is extremely time inefficient, since it's a O(n^2) operation, but I couldn't find another alternative
if [[ "${file1%.*}" == "${file2%.*}" ]] ; then
# extract file_number, and par1, par2 based on some conditions, then append to the csv file
paste -d ',' "$file1" "$file2" | while IFS="," read -r var1 var2;
do
echo "$par1,$par2,$var1,$var2,$file_number" >> "file_out.csv"
done
fi
done
done
would be (untested):
for file1 in *.ext1; do
base="${file1%.*}"
file2="${base}.ext2"
paste -d ',' "$file1" "$file2" |
awk -v base="$base" '
BEGIN { split(base,b,/_/); FS=OFS="," }
{ print b[3], b[4], $1, $2, b[2] }
'
done > 'file_out.csv'
Doing base="${file1%.*}"; file2="${base}.ext2"
itself would be N^2 times (given N pairs of files) more efficient than for file2 in *.ext2 ; do if [[ "${file1%.*}" == "${file2%.*}" ]] ; then
and doing | awk '...'
itself would be an order of magnitude more efficient than | while IFS="," read -r var1 var2; do echo ...; done
(see why-is-using-a-shell-loop-to-process-text-considered-bad-practice) so you can expect to see a huge improvement in performance over your existing script.
Upvotes: 2
Reputation: 20032
Your command fails:
awk 'NR==FNR {a[$count]=$1; count+=1; next} {print a[$count] "," $1; count+=1;}' file1 file2 > file3
Don't use $count
but count
, start with count 1 and reset count to 1 when starting in file2. The last two conditions can be added by FNR==1 {count=1}
or {count=FNR}
.
When count
is always the same as FNR
, why use count
?
awk 'NR==FNR {a[FNR]=$1; next} {print a[FNR] "," $1; }' file1 file2
Upvotes: 2
Reputation: 704
awk 'BEGIN {FS=","} {getline file2_line < "file2.txt"; print $1","file2_line }' file1.txt
The begin block sets the field separator to comma, but this only applies to data in file1.txt
The first statement in the main body of the script stores the value of this line from file2.txt into a variable named file2_line. This variable contains the entire line from file2.txt, and the data in the line is not split into fields in the usual fashion. This means that if file2.txt were also comma-delimited, you would probably want to use awk's split function to split the string into an array in order to work with the individual fields.
In awk, one concatenates by simply writing string values one after another, so print $1","file2_line
writes the first field from the first file, a literal comma, and the string value for this line of file2.txt which we stored earlier.
Upvotes: 0
Reputation: 382
You can use yours solutions with "paste". Just add while loop for example to take control for each iteration.
paste -d ',' file1 file2 | while IFS="," read -r lineA lineB;
do
# you can build new file here like you need
echo "$lineA,$lineB"
done
Upvotes: 2