Josh H
Josh H

Reputation: 83

Using AWK to break out pieces of a single file into multiple files, but I need further direction

I am still very new to this type of task, but I have exhausted my resources and thus am reaching out for a helping hand.

I have a single file composed of concatenated files. I am able to use the exact line of code below to break the files apart:

awk "/PATTERN/{x="F"++i;}{print > x;}" sourceFile

BUT -

  1. If possible I would like to dictate a directory for the output files - the above script writes the output files to the "sourceFile" directory, I would want these files to be dropped in some sort of temp directory.

  2. It would be extremely helpful if the output files could retain their "sourceFile" name with perhaps a counter on the end while maintaining a .txt file type - i.e. sourceFile1.txt, sourceFile2.txt, etc.

I have tried the following to retain the sourceFile name, but it was unsuccessful:

set F=sourceFile
awk "/PATTERN/{x="F"++i;}{print > x;}" sourceFile

I apologize if this is rudimentary, but this could greatly aid in daily tasks - so I was hoping someone could help. Thank you in advance!

Upvotes: 3

Views: 7159

Answers (2)

Chris Seymour
Chris Seymour

Reputation: 85893

You are pretty much there just prefix the filename with the directory and append the file extension using string concatenation:

awk '/PATTERN/{file="tmp/"(FILENAME)(++i)".txt"}{print > file}' sourceFile

We don't need to use a shell variable for the input file we can use the awk variable FILENAME instead.

Demo:

$ cat sourceFile 
PATTERN sf1
sf1
sf1
sf1
PATTERN sf2
sf2
sf2
PATTERN sf3
sf3
sf3

$ awk '/PATTERN/{file="tmp/"(FILENAME)(++i)".txt"}{print > file}' sourceFile

$ cat tmp/sourceFile1.txt
PATTERN sf1
sf1
sf1
sf1

$ cat tmp/sourceFile2.txt 
PATTERN sf2
sf2
sf2

$ cat tmp/sourceFile3.txt 
PATTERN sf3
sf3
sf3

Upvotes: 2

Kent
Kent

Reputation: 195269

awk could accept shell variables, if you want to set dir and Filename:

D="/path/to/newfiles/"
F="sourceFile"

awk -v d="$D" -v f="$F" '/PATTERN/{x=d f (++i)}{print > x;}' sourceFile

now, the target dir and filename are dynamic, you could set them to proper values before the awk call.

yet there is another thing you should pay attention to. how many PATTERN in your file. if there are too many, you will see error message something like "too many files opened". in this case, you have to close the last file before writing to new one.

Upvotes: 2

Related Questions