HPWD
HPWD

Reputation: 2240

extract content from html snippet

I have the following code snippet and I'm looking for a better way to parse out the last name.

<TABLE BORDER="0" class="info" width="560">
<TR>
   <TD VALIGN="top"> <B>First Name<B></FONT> </TD>
   <TD VALIGN="top"> <INPUT TYPE="text" NAME="First_Name" SIZE="16" value="Ashley"> </TD>
   <TD VALIGN="top"> <B>Last Name<B></FONT> </TD>
   <TD VALIGN="top"> <INPUT TYPE="text" NAME="Last_Name" SIZE="16" value="Smith"> </TD>
</TR>
<tr>
   <TD VALIGN="top" colspan="2"> <B>Company Name (if any):<B></FONT> </TD>
   <TD VALIGN="top" colspan="2"> <INPUT TYPE="text" NAME="Company_Name" SIZE="24" value=""> </TD>
</tr>
<TR>
   <TD VALIGN="top" colspan=2> <B>Address<B></FONT> </TD>
   <TD VALIGN="top" colspan=2> <INPUT TYPE="text" NAME="Address" SIZE="24" value="123 Any Street Circle "> </TD>
</TR>
<tr>
   <TD VALIGN="top" colspan=2> <B>City <B></FONT> <INPUT type="text" id="City" name="City" SIZE="14" value="Shady Town"> </TD>
   <TD colspan="2" VALIGN="top"> <B>State<B></FONT> <INPUT type="text" id=State name=State SIZE="4" value="Tx"> <B>Zip<B></FONT> <INPUT type="text" id=Zip name=Zip SIZE="8

I have the following but I'm pretty sure I can do this without having to do the replace. What I'm trying to do below is find the starting point, finding the end point, and then taking the text in between. Then once I have that, remove the "matched" text leaving me with the value of the input field.

<cfset LastName_start = findNoCase('<INPUT TYPE="text" NAME="Last_Name" SIZE="16" value="', theString, 0)>  
#lastName_start#  

<cfset LastName_end = findNoCase('">', theString, 0)>   #lastName_end#  
<cfset lastNameValue = '#Mid(theString,LastName_start,LastName_end)#'>
#lastNameValue#

<cfset lastNameValue = replace(lastNameValue, '<INPUT TYPE="text" NAME="Last_Name" SIZE="16" value="', '')>
<cfset lastNameValue = replace(lastNameValue, '">', '')>
<cfset lastNameValue = listFirst(lastNameValue,'"')>
<cfdump var="#lastNameValue#" label="lastNameValue">

Any tips on how I can clean this up using ColdFusion? This is an ethical exercise.

And yes, I did try to format this.

Upvotes: 2

Views: 321

Answers (1)

SOS
SOS

Reputation: 6550

I second Scott Stroz's suggestion about trying JSoup. It usually works well, and is very simple to use.

Download the JSoup jar and load it in your Application.cfc.

component {
    this.name = "MyApplication";
    this.javaSettings = { loadPaths = ["C:\path\to\jsoup-1.12.1.jar"] };
    // ... more application settings
}

Create an instance of JSoup, parse the HTML string and use val() to grab the text of the first matching element. It returns an empty string if the element wasn't found.

You can find a bunch of other helpful examples in the JSoup Cookbook.

<cfscript>
    yourHTMLString = '<TABLE BORDER="0" class="info" ......';

    // parse html
    jsoup = createObject("java", "org.jsoup.Jsoup");
    root = jsoup.parse( yourHTMLString );

    // get the first matching value ...     
    lastName = root.select("input[name='Last_Name']").val();
    firstName = root.select("input[name='First_Name']").val();
    companyName = root.select("input[name='Company_Name']").val();
    cityName = root.select("input[name='City']").val();
    stateName = root.select("input[name='State']").val();
    address = root.select("input[name='Address']").val();
</cfscript>

Results:

Image of extracted values

Upvotes: 3

Related Questions