E3pO
E3pO

Reputation: 503

I'm terrible with regex, can you help?

I've been attempting for the past 4 hours to create a regex to get the information below and add it all to an array that i can run a forloop on. In about 2 hours, if this isn't working, 304 people wont be getting a text message displaying that our school system now has a cancellation.

http://www.wane.com/generic/weather/closings/School_Delays_and_Closings

<tr class="B">
<td width="35%">Blackhawk Christian School</td>

<td width="25%">Allen</td>

<td width="80%">2 Hour Delay&nbsp;</td>
</tr>

<tr class="S">
<td width="35%">Southwest Allen County Schools</td>

<td width="25%">Allen</td>

<td width="80%">2 Hour Delay&nbsp;</td>
</tr>

What I need is a foreach td width="35%" add it to an array with the information of the school system, and the td wdith="80%" information. Because I don't need this for just one school system, I need to check all of them in the list and display it to the user.

I'm doing:

$wanetv = get_url_contents("http://www.wane.com/generic/weather/closings/School_Delays_and_Closings");

To grab the webpage.


EDIT:

Tried to convert some C# posted below into PHP... can't quite figure it out. Here's my attempt:

   $a = "<tr class='B'> <td width='35%'>Blackhawk Christian School</td> <td width='25%'>Allen</td> <td width='80%'>2 Hour Delay&nbsp;</td> </tr> <tr class='S'> <td width='35%'>Southwest Allen County Schools</td><td width='25%'>Allen</td><td width='80%'>2 Hour Delay&nbsp;</td> </tr> ";
    $SchoolNameKeyword = "<td width='35%'>";
    $DelayKeyword = "<td width='80%'>";

    while (strlen(strstr($a, $SchoolNameKeyword))>0)
    {

        $a = substr($a,strrpos($a, $SchoolNameKeyword)+strlen($SchoolNameKeyword));
        $schoolName = substr($a, 0,strrpos( $a, "<"));
        $a = substr($a,strrpos($a, $DelayKeyword) + strlen($DelayKeyword));
        $delay = substr( $a, 0,strrpos( $a, "<"));

        $arr[$schoolName] = $delay;
    }
        print_r($arr);

Prints out:

Array
(
    [Southwest Allen County SchoolsAllen2 Hour Delay  ] => 2 Hour Delay  
)

Upvotes: 1

Views: 746

Answers (5)

salathe
salathe

Reputation: 51950

You would really, really, really be better off using an HTML parser here instead of Regular Expressions... especially when you don't control the source, and they could easily break your regex parsing, while HTML parsing would be somewhat more likely to stay working.

- Andrew Barber

Such an example, using PHP's DOM might look something like the following example. However, I would take exception to Andrew's comments about HTML parsing being "somewhat more likely to stay working" as changes in the source HTML may affect it just as much as any regular expression.

$doc = new DOMDocument;

// Temporarily use "internal" XML error handling to keep HTML warnings quiet
libxml_use_internal_errors(true);
$doc->loadHTMLFile('http://www.wane.com/generic/weather/closings/School_Delays_and_Closings');
libxml_use_internal_errors(false);

// Find each <tr> for our schools
$xpath = new DOMXPath($doc);
$rows  = $xpath->query('///h2[.="Schools: ALL"]/following-sibling::table/tbody/tr[count(td) = 3]');

// Build array of name, county and delay information for each school
$schools = array();
foreach ($rows as $row) {
    $tds    = $row->getElementsByTagName('td');
    $school = $tds->item(0)->textContent;
    $info   = $tds->item(2)->textContent;
    $schools[$school] = $info;
}

echo "Found {$rows->length} schools:" . PHP_EOL;
print_r($schools);

The above uses classes/techniques that you are probably not familiar with. Do ask questions.

Upvotes: 7

mario
mario

Reputation: 145482

Using phpQuery/QueryPath is the simplest option. It's doable with regular expressions, but difficult to get right for newcomers.

A good alternative is to just use an HTML <table> to array conversion class. Since your data is already in a useful structure the workaround over DOM nodes seems wacky. There are some quick to google examples:

Upvotes: 0

Pabuc
Pabuc

Reputation: 5638

$a = "<tr class='B'> <td width='35%'>Blackhawk Christian School</td> <td width='25%'>Allen</td> <td width='80%'>2 Hour Delay&nbsp;</td> </tr> <tr class='S'> <td width='35%'>Southwest Allen County Schools</td><td width='25%'>Allen</td><td width='80%'>2 Hour Delay&nbsp;</td> </tr> "; 

$SchoolNameKeyword = "<td width='35%'>"; 
$DelayKeyword = "<td width='80%'>"; 
$schoolNames = array();
$delays = array();

$i = 0;
while (strlen(strstr($a, $SchoolNameKeyword))>0) 
{ 

    $a = substr($a,strrpos($a, $SchoolNameKeyword)+strlen($SchoolNameKeyword)); 
    $schoolName = substr($a, 0,strrpos( $a, "<")); 
    $a = substr($a,strrpos($a, $DelayKeyword) + strlen($DelayKeyword)); 
    $delay = substr( $a, 0,strrpos( $a, "<")); 

    $delays[$i] = $delay; 
$schoolNames[$i] = $schoolName;
} 
for ($i = 0; $i < $delays; $i++) {
    echo "School: " . $schoolNames[$i] . "\n";
    echo "Delay: " . $delays[$i] . "\n";
}

Upvotes: 1

Rune Aamodt
Rune Aamodt

Reputation: 2611

Are you sure regex is the best way to solve this problem? What about using some kind of HTML DOM API to traverse the table?

Upvotes: 0

Andrew Barber
Andrew Barber

Reputation: 40150

You would really, really, really be better off using an HTML parser here instead of Regular Expressions... especially when you don't control the source, and they could easily break your regex parsing, while HTML parsing would be somewhat more likely to stay working.

Upvotes: 8

Related Questions