Reputation: 19622
I am trying to extract metadata using apache tika and then putting into HashMap.. But my code get's only the key not the value of that key.. For Example.. It stores only title(as a key) but not its value, in the same way it store keywords(as a key) but not its value..
And if i try to see what does md contains, its shows this:-
Description= title=Wireless Technology & Innovation | Mobile Technology Content-Encoding=UTF-8 Content-Type=text/html; charset=utf-8 Keywords= google-site-verification=AzhlXdqBSdUCRPJRY1evCtp2Ko5r9kxB_f81WffACUc
private Map<String, String> metaData;
try {
Metadata md = new Metadata();
htmlStream = new ByteArrayInputStream(htmlContent.getBytes());
String parsedText = tika.parseToString(htmlStream, md);
//very unlikely to happen
if (text == null){
text = parsedText.trim();
}
processMetaData(md);
} catch (Exception e) {
e.printStackTrace();
} finally {
IOUtils.closeQuietly(htmlStream);
}
private void processMetaData(Metadata md){
if ((getMetaData() == null) || (!getMetaData().isEmpty())) {
setMetaData(new HashMap<String, String>());
}
for (String name : md.names()){
//This below line is not working I guess, it stores only the key.. not the value of that particular key..
getMetaData().put(name.toLowerCase(), md.get(name));
}
}
public Map<String, String> getMetaData() {
return metaData;
}
public void setMetaData(Map<String, String> metaData) {
this.metaData = metaData;
}
Any help will be appreciated..
Upvotes: 1
Views: 2540
Reputation: 48346
First up, Tika allows multiple values for a given key. You're better off thinking of it as a Map<String,List<String>>
rather than a simple Map<String,String>
I'd suggest you look at the Tika Metadata JavaDocs. You'll either want to check the isMultiValued(String key)
method for each one, or just call getValues(String key)
every time
To get the first value for a given key, metadata.get(String key)
is the right way to go. Not sure why it isn't working for you
You probably want to play with the Tika App jar, that's the best way to debug things, eg:
java -jar tika-app-1.0-SNAPSHOT.jar --metadata problem.file
That'll let you easily see the metadata your file really contains, then when you know that you can track down where in your code you're doing something wrong
Upvotes: 1