Andrey Kuznetsov
Andrey Kuznetsov

Reputation: 11830

HXT getting first element: refactor weird arrow

I need to get text contents of first <p> which is children of <div class="about">, wrote the following code:

tagTextS :: IOSArrow XmlTree String
tagTextS = getChildren >>> getText >>> arr stripString

parseDescription :: IOSArrow XmlTree String
parseDescription =
  (
   deep (isElem >>> hasName "div" >>> hasAttrValue "id" (== "company_about_full_description"))
   >>> (arr (\x -> x) /> isElem  >>> hasName "p") >. (!! 0) >>> tagTextS
  ) `orElse` (constA "")

Look at this arr (\x -> x) – without it I wasn't be able to reach result.

Upvotes: 2

Views: 368

Answers (2)

Gabriel Riba
Gabriel Riba

Reputation: 6738

Another proposal using hxt core as you demand.

To enforce the first child, cannot be done through getChildren output, since hxt arrows have a specific (>>>) that maps subsequent arrows to every list item of precedent output and not the output list, as explained in the haskellWiki hxt page although this is an old definition, actually it derives from Category (.) composition.

getNthChild can be hacked from getChildren of Control.Arrow.ArrowTree

import Data.Tree.Class (Tree)
import qualified Data.Tree.Class as T

-- if the nth element does not exist it will return an empty children list

getNthChild :: (ArrowList a, Tree t) => Int -> a (t b) (t b)
getNthChild n = arrL (take 1 . drop n . T.getChildren)

then your parseDescription could take this form:

-- importing Text.XML.HXT.Arrow.XmlArrow (hasName, hasAttrValue)

parseDescription = 
    deep (isElem >>> hasName "div" >>> hasAttrValue "class" (== "about") 
          >>> getNthChild 0 >>> hasName "p"
          ) 
    >>> getChildren >>> getText

Update. I found another way using changeChildren:

getNthChild :: (ArrowTree a, Tree t) => Int -> a (t b) (t b)
getNthChild n = changeChildren (take 1 . drop n) >>> getChildren

Update: avoid inter-element spacing-nodes filtering non-element children

import qualified Text.XML.HXT.DOM.XmlNode as XN

getNthChild :: (ArrowTree a, Tree t, XN.XmlNode b) => Int -> a (t b) (t b)
getNthChild n = changeChildren (take 1 . drop n . filter XN.isElem) >>> getChildren

Upvotes: 2

Gabriel Riba
Gabriel Riba

Reputation: 6738

It could be something like this with XPath

import "hxt-xpath" Text.XML.HXT.XPath.Arrows (getXPathTrees)

...

xp = "//div[@class='about']/p[1]"

parseDescription = getXPathTrees xp >>> getChildren >>> getText

Upvotes: 4

Related Questions