Reputation: 75
While working on DBpedia extraction framework, I am facing issues with the csv files from the Core Dataset. I'm interested in extracting data (in my case, abstract of all company's wikipedia page) from dbpedia dumps (RDF format). I'm following the instructions from DBpedia Abstract Extractioin Step-by-step Guide
Commands used:
$ git clone git://github.com/dbpedia/extraction-framework.git
$ cd extraction-framework
$ mvn clean install
$ cd dump
$ ../run download config=download.minimal.properties
$ ../run extraction extraction.default.properties
I get the below error when executing the last command "./run extraction extraction.properties.file". Can anyone point out wh at mistake am I making. Is there any specific csv file i need to process or some configur ation issue. I have the full "mediawiki-1.24.1".
Also please note th at pages-articles.xml.bz2, I download it partially upto 256MB only. Please help
parsing /opt/extraction-framework-master/DumpsD ata/wikid atawiki/20150113/wikipedias.csv
java.lang.reflect.Invoc ationTargetException
at sun.reflect.N ativeMethodAccessorImpl.invoke0(N ative Method)
at sun.reflect.N ativeMethodAccessorImpl.invoke(N ativeMethodAccessorImpl.java:62)
at sun.reflect.Deleg atingMethodAccessorImpl.invoke(Deleg atingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at scala_maven_executions.MainHelper.runMain(MainHelper.java:164)
at scala_maven_executions.MainWithArgsInFile.main(MainWithArgsInFile.java:26)
Caused by: java.lang.Exception: expected [15] fields, found [1] in line [%21%21%21 http://www.w3.org/2000/01/rdf-schema#label !!! l]
at org.dbpedia.extraction.util.WikiInfo$.fromLine(WikiInfo.scala:60)
at org.dbpedia.extraction.util.WikiInfo$$anonfun$fromLines$1.apply(WikiInfo.scala:49)
at org.dbpedia.extraction.util.WikiInfo$$anonfun$fromLines$1.apply(WikiInfo.scala:49)
at scala.collection.Iter ator$class.foreach(Iter ator.scala:743)
at scala.collection.AbstractIter ator.foreach(Iter ator.scala:1195)
at org.dbpedia.extraction.util.WikiInfo$.fromLines(WikiInfo.scala:49)
at org.dbpedia.extraction.util.WikiInfo$.fromSource(WikiInfo.scala:36)
at org.dbpedia.extraction.util.WikiInfo$.fromFile(WikiInfo.scala:27)
at org.dbpedia.extraction.util.ConfigUtils$.parseLanguages(ConfigUtils.scala:83)
at org.dbpedia.extraction.dump.sql.Import$.main(Import.scala:29)
at org.dbpedia.extraction.dump.sql.Import.main(Import.scala)
Upvotes: 1
Views: 377
Reputation: 75
i was facing above issue because of incomplete download of enwiki-20150205-pages-articles.xml.bz2 file using
$ ../run download config=download.minimal.properties
but yet failing to resolve abstract extraction issue as i am expecting long abstract from bdpedia dump.
$ ../run extraction extraction extraction.abstracts.properties
it builds completely and perform extraction over 1 cr+ pages but not reflecting any data in long_abstracts_en.nt
i followed instruction to put mediawiki php and mysql etc.
Upvotes: 0