user2051347
user2051347

Reputation: 1669

web harvest - scraping an url

I am using web harvest. However, I want to scrape data from the URL:

http://derstandard.at/anzeiger/immoweb/Suchergebnis.aspx?Regionen=9&Bezirke=&Arten=&AngebotTyp=&timestamp=1363305908912

My code is:

<?xml version="1.0" encoding="UTF-8"?>

<config>
    <var-def name="google">
    <html-to-xml>
    <http url="http://derstandard.at/anzeiger/immoweb/Suchergebnis.aspx?Regionen=9&Bezirke=&Arten=&AngebotTyp=&timestamp=1363305908912"></http>
    </html-to-xml>
    </var-def>
</config>

However I get:

Reference to the entity Bezirke has to end with an ';'

I do not understand what is meant by web harvest, with the ';'?

Upvotes: 2

Views: 1150

Answers (2)

Josip Maslac
Josip Maslac

Reputation: 270

You should encode ampresands in your url ie. change every & with &amp;.

Upvotes: 1

ObjectNameDisplay
ObjectNameDisplay

Reputation: 493

I don't know too much about web-harvesting, but their example has this:

<xpath expression="//a[@shape='rect']/@href">
    <html-to-xml>
        <http url="http://www.somesite.com/"/>
    </html-to-xml>
</xpath>

<http url =".." />

Whereas your code has

<http url = ".."></http> 

Maybe this is your problem? No need for closing tag

Upvotes: 1

Related Questions