Hafiz Muhammad Shafiq
Hafiz Muhammad Shafiq

Reputation: 8670

Apache Nutch 2.3.1 opic scoring filter not working

I have configured Nutch 2.3.1 with complete Hadoop/Hbase ecosystem on a small cluster. I am curious about scoring algorithm used in Nutch. I have found and used opic scoring filter in Nutch. To find its impect, I have check score at different steps in Nutch IN ( dbupdate and generate phase) as guided in Nutch WIKI. But I have found that every document score always remain zero no matter how may iteration I run and how many documents I fetch. Is there some problem in opic implementation or I am missing some of its configuration.

I have observed that _csh_ field that contains cash is removed at fetcher phase from corresponding table in Hbase.

Upvotes: 0

Views: 137

Answers (1)

user1264641
user1264641

Reputation: 139

I had resolved it by putting the changes in OPICScoringFilter.java

src/plugin/scoring-opic/src/java/org/apache/nutch/scoring/opic/OPICScoringFilter.java

I've put it in Markers as UTF8.

-    row.getMetadata().put(CASH_KEY, ByteBuffer.wrap(Bytes.toBytes(score)));
+    row.getMarkers().put(CASH_KEY, new Utf8(Double.toString(score)));

Upvotes: 1

Related Questions