Reputation: 1170
My aim is to extract structured data from webpages. I'm using the code mentioned in this SO question. I'm using Apache Any23 CLI library dependency in my Spring project.
By using this, I'm able to extract the HTML5 Microdata (Schema.org) from webpages. But, I can't extract the JSON-LD format present in the webpages. When I checked Apache Any23's documentation, JSON-LD format is supported in it. Didn't find any further documentations on it.
Upvotes: 8
Views: 579
Reputation: 93
Usually, if you create a new Any23 extractor with new Any23()
it should work out of the box. If you use another constructor like Any23(String... extractorNames)
you have to make make sure that the correct one is added for embedded JSON LD, which is "html-embedded-jsonld"
.
Now if there are any errors in the extraction process, Any23 drops them silently. (It's great, I know!)
I found it is possible to set a breakpoint in the org.apache.any23.extractorExtractionResultImpl
method notifyIssue
. With this you may be able to find a more detailed reason for your problems.
Upvotes: 0