Reputation: 449
Hey guys, so I'm making a script to featch words/results off of this site (http://grecni.com/texttwist.php), So I already have the http request post ready, ect.
Only thing I need now is to fetch out the words, So I'm working with an html source that looks like so:
<html>
<head>
<title>Text Twist Unscrambler</title>
<META NAME="keywords" CONTENT="Text,Twist,Text Twist,Unscramble,Free,Source,php">
</head>
<body>
<font face="arial,helvetica" size="3">
<p>
<b>3 letter words</b><br>sae sac ess aas ass sea ace sec <p>
<b>4 letter words</b><br>cess secs seas ceca sacs case asea casa aces caca <p>
<b>5 letter words</b><br>cacas casas caeca cases <p>
<b>6 letter words</b><br>access <br><br>
Found 23 words in 0.22962 seconds
<form action="texttwist.php" method="post">
enter scrambled letters and I'll return all word combinations<br>
<input type="text" name="l" value="asceacas" size="20" maxlength="20">
<input type="submit" name="button" value="unscramble">
<input type="button" name="clear" value="clear" onClick="this.form.l.value='';">
</form><p>
<a href=texttwist.phps>php source</a>
- it's kinda ugly, but it's fast<p>
<a href=/>back to my page</a>
</body>
</html>
I'm trying to fetch the words like "sae", "sav", "secs", "seas", "casas", ect.
Any help?
This is the farthest i've gotten, don't know what to do from here.: link text
Any suggestions? Help?
Upvotes: 0
Views: 211
Reputation: 27222
If you want any kind of robustness you really want a parser, as mentioned by Adrian, Nokogiri is most popular solution.
If you insist, aware of the madness that you may be in for as the page becomes more complex the following may help:
Search for a line that matches
/^<b>\d+ letter words/
and then you can dig out the bits like so:
a = line.split(/<br>/)[1] # the second half
a.gsub!('<p>', '') # take out the trailing <p>
res = a.split(' ')# this is your data
That being said, this isn't anything you want in production code. You'll be surprised how learning a parser will change how you see this problem.
Upvotes: 0