Reputation: 8990
This website lists over 250 courses in one list. I want to get the name of each course and insert that into my mysql database using php. The courses are listed like this:
<td> computer science</td>
<td> media studeies</td>
…
Is there a way to do that in PHP, instead of me having a mad data entry nightmare?
Upvotes: 4
Views: 4453
Reputation: 316969
How to parse HTML has been asked and answered countless times before. While (for your specific UseCase) Regular Expressions will work, it is - in general - better and more reliable to use a proper parser for this task. Below is how to do it with DOM:
$dom = new DOMDocument;
$dom->loadHTMLFile('http://courses.westminster.ac.uk/CourseList.aspx');
foreach($dom->getElementsByTagName('td') as $title) {
echo $title->nodeValue;
}
For inserting the data into MySql, you should use the mysqli extension. Examples are plentiful on StackOverflow. so please use the search function.
Upvotes: 4
Reputation: 76736
Just for fun, here's a quick shell script to do the same thing.
curl http://courses.westminster.ac.uk/CourseList.aspx \
| sed '/<td>\(.*\)<\/td>/ { s/.*">\(.*\)<\/a>.*/\1/; b }; d;' \
| uniq > courses.txt
Upvotes: 0
Reputation: 155
I encountered the same problem. Here is a good class library called the html dom http://simplehtmldom.sourceforge.net/. This like jquery
Upvotes: 0
Reputation: 6307
Regular expressions work well.
$page = // get the page
$page = preg_split("/\n/", $page);
for ($text in $page) {
$matches = array();
preg_match("/^<td>(.*)<\/td>$/", $text, $matches);
// insert $matches[1] into the database
}
See the documentation for preg_match.
Upvotes: 4
Reputation: 23255
You can use this HTML parsing php library to achieve this :http://simplehtmldom.sourceforge.net/
Upvotes: 2