Reputation: 353

How do I split a file into several files by a multi-character delimiter?

On Unix, without adding anything to the OS (i.e. only using grep, awk, sed, cut, etc.), how can I split the following input into several files (e.g. "_temp1.txt", "_temp2.txt", etc.) starting with each "codeView" line? Note, it is likely that the line begins with several spaces.

What if the input is coming from an API instead of an existing file?

. . .
"events" : [ {
"id" : "123456",
"important" : true,
"codeView" : {
  "lines" : [ {
    "fragments" : [ {
      "type" : "NORMAL_CODE",
      "value" : "str = wrapper.getParameter("
    }, {
      "type" : "NORMAL_CODE",
      "value" : ")"
    } ],
    "text" : "str = wrapper.getParameter(&quot;motif&quot;)"
  } ],
  "nested" : false
},
"probableStartLocationView" : {
  "lines" : [ {
    "fragments" : [ {
      "type" : "STACKTRACE_LINE",
      "value" : "&lt;init&gt;() @ JSONInputData.java:12"
    } ],
    "text" : "&lt;init&gt;() @ JSONInputData.java:92"
  } ],
  "nested" : false
},
"dataView" : {
  "lines" : [ {
    "fragments" : [ {
      "type" : "TAINT_VALUE",
      "value" : "CP"
    } ],
    "text" : "{{#taint}}CP{{/taint}}"
  } ],
  "nested" : false
},
"collapsedEvents" : [ ],
"dupes" : 0
}, {
"id" : "28861,28862",
"important" : false,
"type" : "P2O",
"description" : "String Operations Occurred",
"extraDetails" : null,
          "codeView" : {
  "lines" : [ {
    "fragments" : [ {
      "type" : "TEXT",
      "value" : "Over the following lines of code, blah blah."
    } ],
    "text" : "Over the following lines of code, blah blah."
  } ],
  "nested" : false
},
"probableStartLocationView" : {
  "lines" : [ {
    "fragments" : [ {
      "type" : "STACKTRACE_LINE",
      "value" : "remplaceString() @ O_UtilCaractere.java:234"
    } ],
    "text" : "remplaceString() @ O_UtilCaractere.java:234"
  }, {
    "fragments" : [ {
      "type" : "STACKTRACE_LINE",
      "value" : "replaceString() @ O_UtilCaractere.java:333"
    } ],
    "text" : "replaceString() @ O_UtilCaractere.java:333"
  }, {
    "fragments" : [ {
      "type" : "STACKTRACE_LINE",
      "value" : "creerIncidentPaie() @ Incidents.java:444"
    } ],
    "text" : "creerIncidentPaie() @ Incidents.java:219"
  }, {
    "fragments" : [ {
      "type" : "STACKTRACE_LINE",
      "value" : "repliquerAbsenceIncident() @ Incidents.java:876"
    } ],
    "text" : "repliquerAbsenceIncident() @ IncidentsPaieMgr.java:882"
  } ],
  "nested" : false
},
"dataView" : {
  "lines" : [ {
    "fragments" : [ {
      "type" : "TEXT",
      "value" : "insert into TGE_INCIDENT...4&amp;apos;, &amp;apos;YYYYMMDD&amp;apos;), &amp;apos;A&amp;apos;, &amp;apos;"
    }, {
      "type" : "TAINT_VALUE",
      "value" : "CP"
    }, {
      "type" : "TEXT",
      "value" : "&amp;apos;, &amp;apos;&amp;apos;, null, &amp;apos;T&amp;apos;, &amp;apos;ADPTVT&amp;apos;, to_date(&amp;apos;2013012214..."
    } ],
    "text" : "insert into TGE_INCIDENT...4&amp;apos;, &amp;apos;YYYYMMDD&amp;apos;), &amp;apos;A&amp;apos;, &amp;apos;{{#taint}}CP{{/taint}}&amp;apos;, &amp;apos;&amp;apos;, null, &amp;apos;T&amp;apos;, &amp;apos;ADPTVT&amp;apos;, to_date(&amp;apos;2017062214..."
  } ],
  "nested" : false
}
. . .

Upvotes: 1

Answers (3)

karakfa

Reputation: 67567

awk to the rescue!

$ awk '/"codeView"/{c++} {print > ("_temp" (c+0) ".txt")}' file

the header up to the first match will be in the 0th temp file. If there is a chance that the key may appear in the content perhaps change pattern match to literal match $1=="\"codeView\""

you can pipe in the data to the awk script instead of reading from a file as well.

If there are too many files opened, you may want to close them before it errs out.

Upvotes: 2

Ed Morton

Reputation: 204648

This will work robustly in any awk:

awk '/"codeView"/{close(out); out="_temp" ++c ".txt"} out!=""{print > out}' file

Upvotes: 4

John1024

Reputation: 113994

Try:

csplit -f _temp -b %d.tmp file '/codeView/' '{*}'

Or, if the data comes from some other program:

my_api | csplit -f _temp -b %d.tmp - '/codeView/' '{*}'

How it works

-f _temp -b %d.tmp

These two options sets the names of the split files to format that you want.
file

Replace this with the name of your input file. Use - if input is to come from stdin.
/codeView/

This is the regex that you want to split on.
'{*}'

This tells csplit not to stop at the first match but to keep splitting.

Upvotes: 2

How do I split a file into several files by a multi-character delimiter?

Answers (3)

How it works

Related Questions