Reputation: 151
I need to parse through and HTML file for a homework project, and therefore I can't use Jsoup.
I have tried crawling through the file, but I don't know how to save what I'm looking for.
This is what I have:
FileInputStream fis = new FileInputStream(filename);
InputStreamReader inStream = new InputStreamReader(fis);
BufferedReader reader = new BufferedReader(inStream);
String fileLine;
while((fileLine = reader.readLine()) != null){
String tag = fileLine.substring(fileLine.indexOf("<") + 1,fileLine.indexOf(">"))
}
I need to find the information inside the title> tags, but I can't figure out how to get that information without getting tags I don't need or how to handle cases where there are no tags.
I want to take the information in the title tag and turn it into a string that I can use.
Upvotes: 2
Views: 4286
Reputation: 36
String fileDataString = Files.readAllLines(Paths.get(fileName), Charset.forName("UTF-8")).stream().collect(Collectors.joining("\n"));
String title = StringUtils.substringBetween(fileDataString, "<title>", "</title>"));
This should work to get the text between < title > and < /title >
EDIT: Thank you BlackPearl for the Stream<String>.collect(Collectors.joining("\n"));
suggestion
Upvotes: 2