Incerteza
Incerteza

Reputation: 34884

Find a certain tag in html

I need to find a certain tag (the tag itself and its content) in html:

import Text.XML.Cursor
import Text.HTML.DOM (parseLBS)

page <- simpleHttp "example.com"
let cursor = fromDocument $ parseLBS page
let myTag = cursor -- find the tag <myTag myAttr="some Value">...</<myTag>

How do I find a tag myTag in the response (cursor) by its name and attribute knowing that it exists and it is single (there aren't other tags with the same name and attribute)?

update:

let rightElement e = (TX.elementName e == Data.String.fromString "myTag") && ((Data.String.fromString "myAttr" :: TX.Name, T.pack "some Value") `Map.member` TX.elementAttributes e)

error:

 Couldn't match type `TX.Name' with `(TX.Name, T.Text)'
    Expected type: Map.Map (TX.Name, T.Text) T.Text
      Actual type: Map.Map TX.Name T.Text
    In the return type of a call of `TX.elementAttributes'
    In the second argument of `Map.member', namely
      `TX.elementAttributes e'

Upvotes: 1

Views: 133

Answers (1)

Mark Whitfield
Mark Whitfield

Reputation: 2520

This is probably best accomplished with checkNode:

let rightNode n = case n of
                    NodeElement e -> (elementName e == "myTag") && (("myAttr", "some Value") `member` elementAttributes e)
                    _             -> False
let myTag = head . checkNode rightNode $ cursor -- find the tag <myTag myAttr="some Value">...</<myTag>

I've used head here since you've said you're certain of the existence and uniqueness of the node, but the more correct thing to do would be to add some kind of failure mode, maybe an Either String with a message indicating nonexistence or nonuniqueness.

EDIT: Actually, the case matching above is already wrapped up for us in the checkElement function:

let rightElement e = (elementName e == "myTag") && (("myAttr", "some Value") `member` elementAttributes e)
let myTag = head . checkElement rightElement $ cursor -- find the tag <myTag myAttr="some Value">...</<myTag>

EDIT2: Okay, let's expand a bit, as per request. Working from the docs, the checkElement function has type

checkElement :: Boolean b => (Element -> b) -> Axis

where type Axis = Cursor -> [Cursor]. So checkElement is going to traverse the whole subtree under cursor, and return any elements which match the function we hand it as its first argument. In this case, that's the new function rightElement I've defined. This function returns True if it's the element you said you're looking for (this is, if both the tag name and attribute match), and False otherwise. 'n' and 'e' are just the argument names; that's it.

So, to sum things up in terms of types:

rightElement                              :: Element -> Bool
checkElement                              :: (Element -> Bool) -> Cursor -> [Cursor]
checkElement rightElement                 :: Cursor -> [Cursor]
checkElement rightElement $ cursor        :: [Cursor]
head . checkElement rightElement $ cursor :: Cursor

Upvotes: 2

Related Questions