Nyxynyx
Nyxynyx

Reputation: 63687

Convert Text within Tables to Plain text with linebreaks

Given a chunk of HTML that displays data nicely in <div> and <table>, how can all the HTML/CSS markup be removed while maintaining the text originally found in individual cells and divs now separated with only line breaks?

Current attempt shown here will output one long continuous paragraph instead of maintaining the separation when its in the div or table form.

Original HTML: http://pastebin.com/63N3Kg16

Output:

John Smith | SomeName Realty | (xxx) 939-4835 Allston St, Cambridge, MA Very spacious under renovation with SST/Granite, porch, minutes to MIT, redline, Nov/1 4BR/1BA Apartment $3,400/month Bedrooms 4 Bathrooms 1 full, 0 partial Sq Footage Unspecified Parking None Pet Policy No pets Deposit $0 DESCRIPTION Triple decker building secondfloor apt aprox 2000 sqf with large bedrooms, kitchen, pantry, porch, d/w, all woodfloor and ZTilded in the kitchen, new bath. utilities extra,Nov/1 see additional photos below Contact info: Payman Ahmadifar Bayside Realty (xxx) 939-4835 Posted: Sep 24, 2012, 6:55am PDT

PHP

nl2br(trim(strip_tags($html)));

Expected Output

Plain text with either <br> or newline, no <div> or <table> HTML markup. Basically to make the text more readable, maintaining the spacing/separation structure of the original, but with no CSS stylings or HTML markup except for <br>.

John Smith | SomeName Realty | (xxx) 939-4835 

Allston St, Cambridge, MA 

Very spacious under renovation with SST/Granite, porch, minutes to MIT, redline, Nov/1 

4BR/1BA Apartment $3,400/month 

Bedrooms 4 
Bathrooms 1 full, 0 partial 
Sq Footage Unspecified 
Parking None 
Pet Policy No pets 
Deposit $0 

DESCRIPTION 
Triple decker building secondfloor apt aprox 2000 sqf with large bedrooms, kitchen, pantry, porch, d/w, all woodfloor and ZTilded in the kitchen, new bath. utilities extra,Nov/1 see additional photos below 

Contact info: Payman Ahmadifar Bayside Realty (xxx) 939-4835 
Posted: Sep 24, 2012, 6:55am PDT

Upvotes: 2

Views: 1118

Answers (1)

Baba
Baba

Reputation: 95141

You can play around with some string manipulations

Try

$string = strip_tags($html);
$string = str_replace(chr(32).chr(32).chr(32),"*****",$string);
$newString = array_map(function($var){ return  trim(preg_replace('!\s+!', ' ',$var)); },explode("*****",$string));
print(implode("\n", $newString));

See Live Demo

Upvotes: 1

Related Questions