I open my doc like this doc = Nokogiri::HTML(open(team_url)) and later on I'm parsing through an HTML tables <td> elements. In the HTML, there is often an element that looks like this <td> </td> When I do a content = row.xpath("td[1]/text()") I end up getting ? as a result for content, instead of a space. Why is this, and how can I resolve it?

Reputation: 6589

Nokogiri converting &nbsp; to ?. How can I get it to convert to a space

I open my doc like this doc = Nokogiri::HTML(open(team_url)) and later on I'm parsing through an HTML tables <td> elements.

In the HTML, there is often an element that looks like this

<td>&nbsp;</td>

When I do a

content = row.xpath("td[1]/text()")

I end up getting ? as a result for content, instead of a space.

Why is this, and how can I resolve it?

Upvotes: 1

Reputation: 21

Nokogiri converts " " to no-break space unicode character. You can do a global substitution to resolve.

content.text.gsub("\u00A0", ' ') # replace &nbsp; with space

content.text.gsub("\u00A0", '') # remove &nbsp;

Upvotes: 2