Reputation: 289
I'm using cURL to get a web page and present to our users. Things have worked well until I came upon a website using considerable amounts of Ajax that's formatted so:
33687|updatePanel|ctl00_SiteContentPlaceHolder_FormView1_upnlOTHER_NATL|
<div id="ctl00_SiteContentPlaceHolder_FormView1_othernationalities">
<h4>
<span class="tooltip_text" onmousemove="widetip=false; tip=''; delayToolTip(event,tip,widetip,0,0);return false"
onmouseout="hideToolTip()">
<span id="ctl00_SiteContentPlaceHolder_FormView1_lblProvideOTHER_NATL">Provide the following information:</span></span>
</h4>
|
266|scriptBlock|ScriptContentNoTags|
document.getElementById('ctl00_SiteContentPlaceHolder_FormView1_dtlOTHER_NATL_ctl00_csvOTHER_NATL').dispose = function() {
Array.remove(Page_Validators, document.getElementById('ctl00_SiteContentPlaceHolder_FormView1_dtlOTHER_NATL_ctl00_csvOTHER_NATL'));
}
So, each part of the response is 4 parts: 2 and 3 are just identifiers, 4 is the real "body", and 1 is the length of the body. The problem comes in that we modify the body, and I need to be able to update the length of the 1st part to indicate that; otherwise, we throw a parsing error when inserting this into the web page.
I'm trying to figure out a combination of shell commands (awk, sed, whatever) to: a) read the saved file b) run regex on it to gather each individual block of information (using '(\d*?)\|(.?)\|(.?)\|(.*?)\|') c) make the first capturing group equal to the length of the last capturing group d) save all the regex matches to a new document or back to the original
Any input from "the collective" would be GREATLY appreciated.
Upvotes: 0
Views: 464
Reputation: 1827
It doesn't look like a single line of RegEx will solve this problem, as there is no way to put the first captured bracket between {braces} to indicate the length. This is what I'm thinking would be ideal:
(\d*?)\|([^|]+)\|([^|]+)\|(.{\1})\|
That value can also not be bypassed because there is no indication of an escape character in the case that there is a | somewhere in the message body. I suggest a straight split by '|' and using a 2-dimensional array to store the content. Check every forth item for a matching length and if too short, concatenate a | and the next item, then increment the read counter. PHP shall explain:
$items=explode('|', $file)
$len=count($items);
$oi=0;
$ol=-1;
for($i=0;$i<$count;++$i){
$output[$oi][++$ol]=$items[$i];
if($ol==3){
$target=$output[$oi][0];
while(strlen($output[$oi][3])<$target){
$output[$oi][3].='|'.$items[++$i];
}
++$oi;
$ol=-1;
}
}
Upvotes: 1