Reputation: 35
I have one long text file with many articles in it that I want to split to individual text files (one article = one file) so I can use them in a CMS which works with text files.
I have prepared my file in a way I can differentiate between every article by using a couple of hashtags as a divider like so:
----
Content: First article text
----
Slug: first-article
----
####
----
Content: Second article text
----
Slug: second-article
----
####
I want to match every article via regular expression and create new text files for every match.
How would one do that? I already figured out the regex with Sublime ((?s)####\n\n.+?\n\n
) but how do I output the match to a new text file? I assume there must be a way with bash scripting (which I have never used before)?
Edit: Preferably each text file should be named after the article’s slug.
Any help is much appreciated!
Upvotes: 0
Views: 469
Reputation: 12877
Using GNU awk
awk 'BEGIN {
RS="####" # set record separator to ####
}
{
slug=""; # initialise a variable for later
match($0,/Slug:.*[[:space:]]/); # Match for the slurp text
slug=substr($0,RSTART+6,RLENGTH-8); Extract the slup text after ":"
if (slug!="") {
print $0 > slug # If slug is set, print the line to a file named by contents of slug
}
}' file
One liner:
awk 'BEGIN { RS="####" } { slug="";match($0,/Slug:.*[[:space:]]/);slug=substr($0,RSTART+6,RLENGTH-8);if (slug!="") { print $0 > slug } }'
Upvotes: 2