Reputation: 2114
I am trying to parse a string from a website using Jsoup and wrote the following test to verify that the parsing
This is my test:
@Test
public void extractBookData() throws Exception {
String bookLink = ""; //some address
Document doc = Jsoup.connect(bookLink).get().html();
Book book = new Book();
assertEquals("Literatür Yayıncılık", book.getPublisher(doc));
}
This is getPublisher(Element)
method:
public String getPublisher(Element element){
String tableRowSelector = "tr:contains(Yayınevi)";
String tableColumnSelector = "td";
String tableRowData = "";
element = element.select(tableRowSelector).last();
if (element != null) {
element = element.select(tableColumnSelector).last();
if (element != null) {
tableRowData = element.text().replaceAll(tableRow.getRowName() + " ?:", "").replaceAll(tableRow.getRowName() + " :?", "").replaceAll(" ?: ?", "").trim();
}
}
return tableRowData;
}
The problem is that the actual and expected strings appears the same even though JUnit tells otherwise.
I am open to your suggestions please.
Upvotes: 2
Views: 2013
Reputation: 1240
I have had this same issue before, this is a non-breaking space (char 160) wich is in your text instead of a space (char 32). In my case the text came from an html text input value, yours looks like it hes also come from html.
The solution I used was just too replace all non breaking space chars with a space.
Upvotes: 2