Reputation: 123
I'm trying to split a file using AWK one-line but the code below that I came with is not working properly.
awk '
BEGIN { idx=0; file="original_file.split." }
/^REC_DELIMITER.(HIGH|TOP)$/ { idx++ }
/^REC_DELIMITER.TOP$/,/^REC_DELIMITER.(HIGH|TOP)$/ { print > file sprintf("%03d", idx) }
' original_file
Test file is "original_file":
REC_DELIMITER.TOP
lineA1
lineA2
lineA3
REC_DELIMITER.HIGH
lineB1
lineB2
lineB3
REC_DELIMITER.TOP
lineC1
lineC2
lineC3
REC_DELIMITER.HIGH
lineD1
lineD2
lineD3
AWK code above is for REC_DELIMITER.TOP and it is giving me these files:
original_file.split.001:
REC_DELIMITER.TOP
original_file.split.003:
REC_DELIMITER.TOP
however, I'm trying to get this:
original_file.split.001:
REC_DELIMITER.TOP
lineA1
lineA2
lineA3
original_file.split.003:
REC_DELIMITER.TOP
lineC1
lineC2
lineC3
There will be other record delimiters, and when needed, we can run for them like REC_DELIMITER.HIGH, this way getting files like below:
original_file.split.002:
REC_DELIMITER.HIGH
lineB1
lineB2
lineB3
original_file.split.004:
REC_DELIMITER.HIGH
lineD1
lineD2
lineD3
Any help guys is very appreciate, I have been trying to get this working past few days and AWK code above is the best I was able to get. I need now help from AWK masters. :)
Thank you!
Upvotes: 3
Views: 1587
Reputation: 123
I'm not very used to AWK, however, plasticide's answer put me towards right direction and I finally got AWK script working as requirements.
In below code, first IF turn echo to 0 if a demilier is found. Second IF turn echo to 1 if the wanted delimiter is found, then the want ones are are split from file.
I know regex could be something like /^(REC_(DELIMITER\.(TOP|HIGH|LOW)|NO_CATEGORY)$/
but since regex is created dynamically via shellscript that reads from an specific file a list of delimiters, it will look more like in AWK below.
awk 'BEGIN {
idx=0; echo=1; file="original_file.split."
}
{
#All the delimiters to consider in given file
if($0 ~ /^(REC_DELIMITER.TOP|REC_DELIMITER.HIGH|REC_DELIMITER.LOW|REC_NO_CATEGORY)$/) {
echo=0
}
#Delimiters that should actually be pulled
if($0 ~ /^(REC_DELIMITER.HIGH|REC_DELIMITER.LOW)$/ {
idx++; echo=1
}
#Print to a file is match wanted delimmiter
if(echo) {
print > file idx
}
}' original_file
Thank you all. I really appreciate it very much.
Upvotes: -2
Reputation: 2999
awk -vRS=REC_DELIMITER '/^.TOP\n/{print RS $0 > sprintf("original_file.split.%03d",n)};!++n' original_file
(Give or take an extra newline at the end.)
Generally, when input is supposed to be treated as a series of multi-line records with a special line as delimiter, the most direct approach is to set RS (and often ORS) to that delimiter.
Normally you'd want to add newlines to its beginning and/or end, but this case is a little special so it's easier without them.
Edited to add: You need GNU Awk for this. Standard Awk considers only the first character of RS.
Upvotes: 2
Reputation: 1240
I made some changes so the different delimiters go to the their own file, even when they occur later in the file. make a file like splitter.awk with the contents below, the chmod +x it and run it with ./splitter.awk original_file
#!/usr/bin/awk -f
BEGIN {
idx=0;
file="original_file.split.";
out=""
}
{
if($0 ~ /^REC_DELIMITER.(TOP|HIGH)/){
if (!cnt[$0]) {
cnt[$0] = ++idx;
}
out=cnt[$0];
}
print > file sprintf("%03d", out)
}
Upvotes: 1
Reputation: 77185
You can try something like this:
awk '
/REC_DELIMITER\.TOP/ {
a=1
b=0
file = sprintf (FILENAME".split.%03d",++n)
}
/REC_DELIMITER\.HIGH/ {
b=1
a=0
file = sprintf (FILENAME".split.%03d",++n)
}
a {
print $0 > file
}
b {
print $0 > file
}' file
Upvotes: 5
Reputation: 204638
You need something like this (untested):
awk -v dtype="TOP" '
BEGIN { dbase = "^REC_DELIMITER\\."; delim = dbase dtype "$" }
$0 ~ dbase { inBlock=0 }
$0 ~ delim { inBlock=1; idx++ }
inBlock { print > sprintf("original_file.split.%03d", idx) }
' original_file
Upvotes: 3