Nusrat
Nusrat

Reputation: 699

How do I extract inner text from HTML markup?

I have the following code:

import Text.HTML.TagSoup

parseTags "<hello>my&amp;</world>" 

which is giving me output like: [TagOpen "hello" [],TagText "my&",TagClose "world"]. But I want to get only [TagText "my&"]. And I can do this:

filter (~== "my&")$ parseTags "<hello>my&amp;</world>"

which will give me output like: [TagText "my&"]. But I do not know what is inside the TagText, i.e. "my&". My ultimate target is to get "my&" which I can get by

map(fromTagText) $ filter (~== "my&")$ parseTags "<hello>my&amp;</world>"

I tried to use TagText, but can’t do it right way.

Upvotes: 3

Views: 331

Answers (2)

somesoaccount
somesoaccount

Reputation: 1267

If you really only want the "my&" you can use innerText from TagSoup:

innerText (parseTags "<hello>my&amp;</world>")

It only looks for text tags and concatenates them. So this

innerText (parseTags "<hello>my&amp;</world><foo>bar</foo>")

gets you "my&bar".

Upvotes: 1

Daniel Wagner
Daniel Wagner

Reputation: 152707

> filter isTagText (parseTags "<hello>my&amp;</world>")
[TagText "my&"]

Upvotes: 3

Related Questions