Reputation: 871
There's a bash
script I've been working on and within this script at some point, I have been trying to figure out how to process two CSV files at once using awk
, which will be used to produce several output files. Shortly, there's a main file which keeps the content to be dispatched to some other output files whose names and the number of records they need to be hold, will be derived from another file. First n
records will go to first output file and consequent n+1
to n+k
to second one and so on.
To be more clear here's an example of how the main record file might look:
x11,x21
x12,x22
x13,x23
x14,x24
x15,x25
x16,x26
x17,x27
x18,x28
x19,x29
and how the other file might look like:
out_file_name_1,2
out_file_name_2,3
out_file_name_3,4
Then the first output file named as out_file_name_1
should look like:
x11,x21
x12,x22
Then the second output file named as out_file_name_2
should look like:
x13,x23
x14,x24
x15,x25
And the last one should look like:
x16,x26
x17,x27
x18,x28
x19,x29
Hopefully it is clear enough.
Upvotes: 0
Views: 795
Reputation: 10865
Here's a solution in awk since you asked, but clearly triplee's answer is the nicer approach.
$ cat oak.awk
BEGIN { FS = ","; fidx = 1 }
# Processing files.txt, init parallel arrays with filename and number of records
# to print to each one.
NR == FNR {
file[NR] = $1
records[NR] = $2
next
}
# Processing main.txt. Print record to current file. Decrement number of records to print,
# advancing to the next file when number of records to print reaches 0
fidx in file && records[fidx] > 0 {
print > file[fidx]
if (! --records[fidx]) ++fidx
next
}
# If we get here, either we ran out of files before reading all the records
# or a file was specified to contain zero records
{ print "Error: Insufficient number of files or file with non-positive number of records"
exit 1 }
$ cat files.txt
out_file_name_1,2
out_file_name_2,3
out_file_name_3,4
$ cat main.txt
x11,x21
x12,x22
x13,x23
x14,x24
x15,x25
x16,x26
x17,x27
x18,x28
x19,x29
$ awk -f oak.awk files.txt main.txt
$ cat out_file_name_1
x11,x21
x12,x22
$ cat out_file_name_2
x13,x23
x14,x24
x15,x25
$ cat out_file_name_3
x16,x26
x17,x27
x18,x28
x19,x29
Upvotes: 1
Reputation: 189297
I wouldn't use Awk for this.
while IFS=, read -u 3 filename lines; do
head -n "$lines" >"$filename"
done 3<other.csv <main.csv
The read -u
to read from a particular file descriptor is not completely portable, I believe, but your question is tagged bash so I am assuming that is not a problem here.
Demo: http://ideone.com/6FisHT
If you end up with empty files after the first, maybe try to replace the inner loop with additional read
statements.
while IFS=, read -u 3 filename lines; do
for i in $(seq 1 "$lines"); do
read -r line
echo "$line"
done >"$filename"
done 3<other.csv <main.csv
Upvotes: 1