r00ky
r00ky

Reputation: 35

Find regex pattern in text file and output each match to new text file

I have one long text file with many articles in it that I want to split to individual text files (one article = one file) so I can use them in a CMS which works with text files.

I have prepared my file in a way I can differentiate between every article by using a couple of hashtags as a divider like so:

----
Content: First article text
----
Slug: first-article
----

####

----
Content: Second article text
----
Slug: second-article
----

####

I want to match every article via regular expression and create new text files for every match. How would one do that? I already figured out the regex with Sublime ((?s)####\n\n.+?\n\n) but how do I output the match to a new text file? I assume there must be a way with bash scripting (which I have never used before)?

Edit: Preferably each text file should be named after the article’s slug.

Any help is much appreciated!

Upvotes: 0

Views: 469

Answers (1)

Raman Sailopal
Raman Sailopal

Reputation: 12877

Using GNU awk

awk 'BEGIN { 
             RS="####" # set record separator to ####
           } 
           { 
             slug=""; # initialise a variable for later
             match($0,/Slug:.*[[:space:]]/); # Match for the slurp text
             slug=substr($0,RSTART+6,RLENGTH-8); Extract the slup text after ":"
             if (slug!="") { 
                      print $0 > slug # If slug is set, print the line to a file named by contents of slug 
             }  
            }' file

One liner:

awk 'BEGIN { RS="####" } { slug="";match($0,/Slug:.*[[:space:]]/);slug=substr($0,RSTART+6,RLENGTH-8);if (slug!="") { print $0 > slug }  }'

Upvotes: 2

Related Questions