ragaa
ragaa

Reputation: 61

nutch crawler is crawling ' as â€

nutch crawler is crawling let's as Let’s y??? is there is any setting to change the this charset..

Upvotes: 0

Views: 300

Answers (2)

Jon Skeet
Jon Skeet

Reputation: 1503090

I haven't used Nutch myself, but this page looks like it's relevant:

To enable passing of UTF-8 characters, edit $TOMCAT/conf/server.xml. Locate the <Connector> tag for the web (look for "8080") and insert this parameter assignment: URIEncoding="UTF-8" as explained in Tomcat 5 FAQ at http://tomcat.apache.org/faq/connectors.html#utf8

Upvotes: 1

Jim Garrison
Jim Garrison

Reputation: 86774

’ is the UTF-8 encoding of the single closing quote (not the apostrophe), and you're interpreting it as Windows-1252. You need to use the right encoding (UTF-8). This link may help.

Upvotes: 1

Related Questions