badner
badner

Reputation: 818

splittling a file into multiple with a delimiter awk

I am trying to split files evenly in a number of chunks. This is my code:

awk '/*/ { delim++ } { file = sprintf("splits/audio%s.txt", int(delim /2)); print >> file; }' < input_file

my files looks like this:

"*/audio1.lab"
0 6200000 a
6200000 7600000 b
7600000 8200000 c
.
"*/audio2.lab"
0 6300000 a
6300000 8300000 w
8300000 8600000 e
8600000 10600000 d
.

It is giving me an error: awk: line 1: syntax error at or near * I do not know enough about awk to understand this error. I tried escaping characters but still haven't been able to figure it out. I could write a script in python but I would like to learn how to do this in awk. Any awkers know what I am doing wrong?

Edit: I have 14021 files. I gave the first two as an example.

Upvotes: 0

Views: 512

Answers (2)

Ed Morton
Ed Morton

Reputation: 203169

idk what all the other stuff around this question is about but to just split your input file into separate output files all you need is:

awk '/\*/{close(out); out="splits/audio"++c".txt"} {print > out}' file

Since "repetition" metacharacters like * or ? or + can take on a literal meaning when they are the first character in a regexp, the regexp /*/ will work just fine in some (e.g. gawk) but not all awks and since you apparently have a problem with having too many files open you must not be using gawk (which manages files for you) so you probably need to escape the * and close() each output file when you're done writing to it. No harm doing that and it makes the script portable to all awks.

Upvotes: 1

David Gish
David Gish

Reputation: 750

For one thing, your regular expression is illegal; '*' says to match the previous character 0 or more times, but there is no previous character.

It's not entirely clear what you're trying to do, but it looks like when you encounter a line with an asterisk you want to bump the file number. To match an asterisk, you'll need to escape it:

awk '/\*/ { close(file); delim++ } { file = sprintf("splits/audio%d.txt", int(delim /2)); print >> file; }' < input_file

Also note %d is the correct format character for decimal output from an int.

Upvotes: 1

Related Questions