Sonson123
Sonson123

Reputation: 11427

Parse meta tag and get HTML content from body with Tika

I parse files with the great Apache Tika library. I want to extract the metatags with my own parser and then get the content only from the <body>-tag as HTML and store it in a database.

I have tried this now for hours/days :-(, but cannot find a solution:

Upvotes: 1

Views: 3045

Answers (1)

Mantra
Mantra

Reputation: 346

Check to see if following links help you a bit..

Content Detection, Metadata and Content Extraction with Apache Tika

Parsing HTML with Apache Tika

Upvotes: 2

Related Questions