Reputation: 63687
Given a chunk of HTML that displays data nicely in <div>
and <table>
, how can all the HTML/CSS markup be removed while maintaining the text originally found in individual cells and divs now separated with only line breaks?
Current attempt shown here will output one long continuous paragraph instead of maintaining the separation when its in the div or table form.
Original HTML: http://pastebin.com/63N3Kg16
Output:
John Smith | SomeName Realty | (xxx) 939-4835 Allston St, Cambridge, MA Very spacious under renovation with SST/Granite, porch, minutes to MIT, redline, Nov/1 4BR/1BA Apartment $3,400/month Bedrooms 4 Bathrooms 1 full, 0 partial Sq Footage Unspecified Parking None Pet Policy No pets Deposit $0 DESCRIPTION Triple decker building secondfloor apt aprox 2000 sqf with large bedrooms, kitchen, pantry, porch, d/w, all woodfloor and ZTilded in the kitchen, new bath. utilities extra,Nov/1 see additional photos below Contact info: Payman Ahmadifar Bayside Realty (xxx) 939-4835 Posted: Sep 24, 2012, 6:55am PDT
PHP
nl2br(trim(strip_tags($html)));
Expected Output
Plain text with either <br>
or newline, no <div>
or <table>
HTML markup. Basically to make the text more readable, maintaining the spacing/separation structure of the original, but with no CSS stylings or HTML markup except for <br>
.
John Smith | SomeName Realty | (xxx) 939-4835
Allston St, Cambridge, MA
Very spacious under renovation with SST/Granite, porch, minutes to MIT, redline, Nov/1
4BR/1BA Apartment $3,400/month
Bedrooms 4
Bathrooms 1 full, 0 partial
Sq Footage Unspecified
Parking None
Pet Policy No pets
Deposit $0
DESCRIPTION
Triple decker building secondfloor apt aprox 2000 sqf with large bedrooms, kitchen, pantry, porch, d/w, all woodfloor and ZTilded in the kitchen, new bath. utilities extra,Nov/1 see additional photos below
Contact info: Payman Ahmadifar Bayside Realty (xxx) 939-4835
Posted: Sep 24, 2012, 6:55am PDT
Upvotes: 2
Views: 1118
Reputation: 95141
You can play around with some string manipulations
Try
$string = strip_tags($html);
$string = str_replace(chr(32).chr(32).chr(32),"*****",$string);
$newString = array_map(function($var){ return trim(preg_replace('!\s+!', ' ',$var)); },explode("*****",$string));
print(implode("\n", $newString));
Upvotes: 1