NoSoup4you
NoSoup4you

Reputation: 668

How can i extract field name and values on a form easily

I am in need of parsing field name and values from an html form to add to my db. I know i can go and do a find "input name='" then start another find to find the closing "'" and get the data via mid function then do the same for value via find "value='" I was wondering if there is an easier way to loop the doc and extract all input names and the associated values ?

Below is a sample of what my page to parse looks like

    <input name='a_glare' value='B' class='inputbox-highlighted-false' size='1' maxlength='1'> 
</td> 
<td align="center"> 
    <input name='a_testani' value='' class='inputbox-highlighted-false' size='1' maxlength='1'> 
</td> 
<td align="center"> 
    <input name='a_tksig' value='EC' class='inputbox-highlighted-false' size='2' maxlength='2'> 
</td> 
<td align="center"> 
    <input name='a_sacnon' value='' class='inputbox-highlighted-false' size='1' maxlength='1'> 
</td> 
<td align="center"> 
    <input name='a_ot' value='' class='inputbox-highlighted-false' size='1' maxlength='1'> 
</td> 
<td align="center"> 
    <input name='a_ovlp' value='' class='inputbox-highlighted-false' size='1' maxlength='1'> 

Upvotes: 1

Views: 2136

Answers (3)

Leigh
Leigh

Reputation: 28873

For parsing html, I would recommend using JSoup instead of regular expressions. I just started using JSoup and found it extremely simple to use. Just download the jar and add it to your application class path.

I am not an expert by any means, but was able to print all of the "input" fields from your sample html page using this snippet:

<cfscript>
    // parse html string into document
    jsoup = createObject("java", "org.jsoup.Jsoup");
    doc = jsoup.parse( yourHTMLContentString );

    // grab all "input" fields
    fields = doc.select("input");

    for (elem in fields) {
        // get attributes of each field
        fieldName = elem.attr("name");
        fieldValue = elem.attr("value");
        fieldType = elem.attr("type");

        // display values
        WriteOutput("<br>type: "& fieldType 
              &" name: "& fieldName 
              &" value: "& fieldValue
        );
    }

</cfscript>

(.. and yes, despite your moniker, I am suggesting "JSoup4You" )


Update:

The fields variable is an array. So you can loop through it in cfml the same way. It seems like double work, but if you prefer, you can extract the input names and values into your own array of structures (or whatever CF construct you like). For example:

// initialize storage array
yourArray = [];

for (elem in fields) {

    // extract field properties into a structure 
    data = { name=elem.attr("name")
            , value=elem.attr("value")
            , type=elem.attr("type")
    };

    // store in array
    arrayAppend(yourArray, data);
}

// display array contents
WriteDump(yourArray);

Upvotes: 4

John Whish
John Whish

Reputation: 3036

You could try parsing it using two regular expressions to get the field names and field values. This is what I came up with using your example HTML.

<cfsavecontent variable="foo">
    <input name='a_glare' value='B' class='inputbox-highlighted-false' size='1' maxlength='1'> 
</td> 
<td align="center"> 
    <input name='a_testani' value='' class='inputbox-highlighted-false' size='1' maxlength='1'> 
</td> 
<td align="center"> 
    <input name='a_tksig' value='EC' class='inputbox-highlighted-false' size='2' maxlength='2'> 
</td> 
<td align="center"> 
    <input name='a_sacnon' value='' class='inputbox-highlighted-false' size='1' maxlength='1'> 
</td> 
<td align="center"> 
    <input name='a_ot' value='' class='inputbox-highlighted-false' size='1' maxlength='1'> 
</td> 
<td align="center"> 
    <input name='a_ovlp' value='' class='inputbox-highlighted-false' size='1' maxlength='1'> 
</cfsavecontent>

<!--- extract the fieldnames and field values attributes --->
<cfset fieldnames = rematch("name='[a-z_]+'", foo)>
<cfset fieldvalues = rematch("value='[^']*'", foo)>

<!--- extract the values and build a struct of fieldname : value --->
<cfset keys = {}>
<cfloop from="1" to="#arraylen(fieldnames)#" index="index">
  <cfset keys[rereplace(fieldnames[index], "name='|'", "", "all")] = rereplace(fieldvalues[index], "value='|'", "", "all")>
</cfloop>

<cfdump var="#keys#">  

Upvotes: 1

Mark A Kruger
Mark A Kruger

Reputation: 7193

Well here's one idea that may not be any better than simply regexing out every thing.

1) Add a closing slash to each of your input values so they look like so:

   <input name='a_ot' 
                        value='' 
                        class='inputbox-highlighted-false'
                        size='1'
                        maxlength='1'/>

2) extract the whole table starting with the <table> tag and ending with the </table> tag.

3) Parse the table into an XML object using XMLParse as in:

Now you have an XML object with an array of TD tages each of which would have an INPUT child with attributes of name and value. You could use cfdump and loop code to extract or clean it up.

Again, this may not save you any time depending on how messy the HTML is and how hard you have to work to figure out the XML. Good luck.

Upvotes: 0

Related Questions