Reputation: 2540
I've currently got an issue.
I'm attempting to format a block of text using regular expressions, and I'll explain what I've got so far and then I'll go on to explain my problem.
I have a text file, with some narrative text.
VOLUME I
CHAPTER I
Lorem Ipsum is simply dummy text of the printing and typesetting industry.
Lorem Ipsum has been the industry's standard dummy text ever since the 1500s,
when an unknown printer took a galley of type and scrambled it to make a type
It was popularised in the 1960s with the release of Letraset sheets containing
Lorem Ipsum passages, and more recently with desktop publishing software like
Aldus PageMaker including versions of Lorem Ipsum.
VOLUME II
CHAPTER II
Lorem Ipsum is simply dummy text of the printing and typesetting industry.
It has survived not only five centuries, but also the leap into electronic
typesetting, remaining essentially unchanged.
It was popularised in the 1960s with the release of Letraset sheets
containing Lorem Ipsum passages, and more recently with desktop
publishing software like Aldus PageMaker including versions of Lorem Ipsum.
...
...
It has multiple VOLUMES and CHAPTERS, and needs to be formatted by PHP to look like it does in the text file, with appropriate spacing.
First, I call this formatting function to handle some whitespacing and cleanup.
<?php
function formatting($AStr)
{
return preg_split('/[\r\n]{2,}/', trim($AStr));
}
?>
Then, I call the file and continue attempting to format.
<!DOCTYPE html>
<html>
<head>
<title></title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<link rel="stylesheet" type="text/css" href="style.css" />
</head>
<body>
<h1>Jane Austen</h1>
<h2>Emma</h2>
<?php
require_once 'format.inc.php';
$p = file_get_contents('emma.txt');
$p = formatting($p);
/*
foreach ($p as $l) {
$l = trim($l);
preg_replace('/(VOLUME +[IVX]+)/', "jjj", $l);
$volumePattern = '/(VOLUME +[IVX]+)/';
$chaperPattern = '/(CHAPTER +[IVX]+)/';
$l = str_replace("\r\n", ' ', $l);
if (preg_match('/(VOLUME +[IVX]+)/', $l, $m)) {
echo '<h3>' . $m[1] . '</h3>';
}
if (preg_match('/(CHAPTER +[IVX]+)/', $l, $m)) {
echo '<h3>' . $m[1] . '</h3>';
}
preg_replace('/(VOLUME +[IVX]+)/', "jjj", $l);
echo $l . "\n";
}*/
foreach ($p as $l) {
//$l = trim($l);
//$l = str_replace("[\r\n]", '\n', $l);
if (preg_match('/[\.\w]/', $l, $m)) {
echo "\n";
}
if (preg_match('/(VOLUME +[IVX]+)/', $l, $m)) {
echo '<h3>' . $m[1] . '</h3>';
}
$l = preg_replace('/(VOLUME +[IVX]+)/', '', $l);
if (preg_match('/(CHAPTER +[IVX]+)/', $l, $m)) {
echo '<h3>' . $m[1] . '</h3>';
}
$l = preg_replace('/(CHAPTER +[IVX]+)/', '', $l);
echo $l . "\n";
}
?>
</body>
</html>
The issue is, that I can't get the whitespace (newline) between each paragraph to print. I've tried, but I can't. I tried by using this line:
if (preg_match('/[\.\w]/', $l, $m)) {
echo "\n";
}
Upvotes: 0
Views: 147
Reputation: 88697
This might be massively over-simplified, but can't you just do this?
<!DOCTYPE html>
<html>
<head>
<title></title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<link rel="stylesheet" type="text/css" href="style.css" />
</head>
<body>
<h1>AUTHOR NAME</h1>
<h2>TITLE</h2>
<?php
$p = file_get_contents('emma.txt');
echo preg_replace('/^\s*((?:VOLUME|CHAPTER)\s+[IVX]+)\s*$/im', '<h3>$1</h3>', $p);
?>
</body>
</html>
EDIT
To also wrap the body paragraphs in <p></p>
(assuming there are no new lines in a paragraph) try this:
<!DOCTYPE html>
<html>
<head>
<title></title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<link rel="stylesheet" type="text/css" href="style.css" />
</head>
<body>
<h1>AUTHOR NAME</h1>
<h2>TITLE</h2>
<?php
$p = file_get_contents('emma.txt');
echo preg_replace_callback('/^\s*(?:(?P<header>(?:VOLUME|CHAPTER)\s+[IVX]+)|(?P<body>.+))\s*$/im', function($matches) {
if (!empty($matches['body'])) {
return '<p>'.htmlspecialchars($matches['body']).'</p>';
} else {
return '<h3>'.htmlspecialchars($matches['header']).'</h3>';
}
}, $p);
?>
</body>
</html>
Upvotes: 3
Reputation: 144
you have diferrent errors, first in 'formating' function the regexp must be :
function formatting($AStr)
{
return preg_split('/[\r\n]{2,}/', trim($AStr));
}
after you must know that preg_replace has no variable passed by reference so you must replace your line by the return of the function :
foreach ($p as $l) {
$l = trim($l);
preg_replace('#VOLUME\s+[A-z]+#Ui', "jjj", $l);
$l = str_replace("\r\n", ' ', $l);
if (preg_match('/(VOLUME +[IVX]+)/', $l, $m)) {
echo '<h3>' . $m[1] . '</h3>';
}
$l = preg_replace('/(VOLUME +[IVX]+)/', '', $l);
if (preg_match('/(CHAPTER +[IVX]+)/', $l, $m)) {
echo '<h3>' . $m[1] . '</h3>';
}
$l = preg_replace('/(CHAPTER +[IVX]+)/', '', $l);
echo $l . "\n";
}
Upvotes: 1