Reputation: 2681
I am using Emacs, Slime, and SBCL to develop Common Lisp in a Desktop PC running NixOS.
In addition, I am using the libraries dex, plump, and clss to extract the title of a webpage. Thus, I did:
CL-USER> (clss:select "title" (plump:parse (dex:get "http://www.pdelfino.com.br")))
#(#<PLUMP-DOM:ELEMENT title {1009C488E3}>)
I was expecting: "Pedro Delfino".
Instead, I got the object:
#(#<PLUMP-DOM:ELEMENT title {1009C488E3}>)
If I describe the object it does not help me finding the value I want:
CL-USER> (clss:select "title" (plump:parse (dex:get "http://www.pdelfino.com.br")))
#(#<PLUMP-DOM:ELEMENT title {100A9888E3}>)
CL-USER> (describe *)
#(#<PLUMP-DOM:ELEMENT title {100A9888E3}>)
[vector]
Element-type: T
Fill-pointer: 1
Size: 10
Adjustable: yes
Displaced: no
Storage vector: #<(SIMPLE-VECTOR 10) {100A9B65BF}>
; No value
CL-USER>
Where is the value that I need?
Thanks
Upvotes: 1
Views: 250
Reputation: 18375
You can ask plump to return the text inside the HTML node with plump:text
. It accepts one node, and not an array (returned by clss:select
), so you have to use aref
to get the first one.
(plump:text (aref
(clss:select "title" (plump:parse
(dex:get "http://www.pdelfino.com.br")))
0))
plump:serialize
would return the HTML content (useful to inspect the results).
You can also use CLSS and Plump together at the same time by using LQuery. https://shinmera.github.io/lquery/ We need to parse the HTML with initialize
, then we use $
as in (lquery:$ <document> "selector")
. We can add (text)
or (serialize)
as last arguments.
(defparameter *PDELFINO-PARSED* (lquery:$ (initialize (dex:get "http://www.pdelfino.com.br"))))
(lquery:$ *PDELFINO-PARSED* "title")
#(#<PLUMP-DOM:ELEMENT title {1008645923}>)
CIEL-USER> (lquery:$ *PDELFINO-PARSED* "title" (text))
#("Pedro Delfino")
CIEL-USER> (aref * 0)
"Pedro Delfino"
CIEL-USER> (lquery:$ *PDELFINO-PARSED* "title" (serialize))
#("<title>Pedro Delfino</title>")
Upvotes: 2
Reputation: 11854
The text of the title is in its child text-node.
(plump:text (plump:first-child (aref (clss:select "title" (plump:parse (dex:get "http://www.pdelfino.com.br"))) 0)))
will return that text in this example.
Upvotes: 2