How do I parse an html file without using Jsoup?

Question

I need to parse through and HTML file for a homework project, and therefore I can't use Jsoup.

I have tried crawling through the file, but I don't know how to save what I'm looking for.

This is what I have:

    FileInputStream fis = new FileInputStream(filename);
    InputStreamReader inStream = new InputStreamReader(fis);
    BufferedReader reader = new BufferedReader(inStream);

    String fileLine;
    while((fileLine = reader.readLine()) != null){

        String tag = fileLine.substring(fileLine.indexOf("<") + 1,fileLine.indexOf(">"))
    }

I need to find the information inside the title> tags, but I can't figure out how to get that information without getting tags I don't need or how to handle cases where there are no tags.

I want to take the information in the title tag and turn it into a string that I can use.

Mike de Groot · Accepted Answer

String fileDataString = Files.readAllLines(Paths.get(fileName), Charset.forName("UTF-8")).stream().collect(Collectors.joining("
"));

String title = StringUtils.substringBetween(fileDataString, "", ""));

This should work to get the text between < title > and < /title >

EDIT: Thank you BlackPearl for the Stream.collect(Collectors.joining(" ")); suggestion

How do I parse an html file without using Jsoup?

Answers (1)

Related Questions