Reputation: 167
our customer supplied us with XML data that needs to be processed using PHP. They chose to abuse attributes by using them for big chunks of text (containing line breaks). The XML parser replaces the line breaks with spaces to make the XML W3 compliant.
To make sure we do not lose our line breaks, I want to read in the file as a string, then translate all line breaks that are between double quotes with
. I think I need a regular expression for that, but I am having trouble coming up with one.
This is my test code (PHP 5) so far, using a look-ahead and look-behind, but it does not work:
$xml = '<tag attribute="Header\r\rFirst paragraph.">\r</tag>';
$pattern = '/(?<=")([^"]+?)\r([^"]+?)(?=")/';
print_r( preg_replace($pattern, "$1 $2", $xml) );
Can anyone help me getting this right? Should be easy for a seasoned regexp master :)
Upvotes: 0
Views: 959
Reputation: 167
Exactly, that is what I ended up with. For future reference I will post the working code here:
<?php
header("Content-Type: text/plain");
$xml = '<tag attribute="Header\r\rFirst paragraph.">\r</tag>';
// split the contents at the quotes
$array = preg_split('/["]+/', $xml);
// replace new lines in each of the odd strings parts
for($i=1;$i<count($array);$i+=2){
$array[$i] = str_replace('\n\r',' ',$array[$i]);
$array[$i] = str_replace('\r\n',' ',$array[$i]);
$array[$i] = str_replace('\r',' ',$array[$i]);
$array[$i] = str_replace('\n',' ',$array[$i]);
}
// reconstruct the original string
$xml = implode('"', $array);
print_r( $xml );
?>
Thanks for replying and supporting this solution :)
Upvotes: 1
Reputation: 72530
The best method would be to search character-by-character instead. Set a boolean to true if you encounter a quote mark, then to false when you find the matching quote.
If you find a new line character, if you are inside the quotes (i.e. your variable is true) then "translate with
" whatever you mean by that. Otherwise leave it alone.
Upvotes: 1