newarsenic
newarsenic

Reputation: 151

How do I parse an html file without using Jsoup?

I need to parse through and HTML file for a homework project, and therefore I can't use Jsoup.

I have tried crawling through the file, but I don't know how to save what I'm looking for.

This is what I have:

    FileInputStream fis = new FileInputStream(filename);
    InputStreamReader inStream = new InputStreamReader(fis);
    BufferedReader reader = new BufferedReader(inStream);

    String fileLine;
    while((fileLine = reader.readLine()) != null){

        String tag = fileLine.substring(fileLine.indexOf("<") + 1,fileLine.indexOf(">"))
    }

I need to find the information inside the title> tags, but I can't figure out how to get that information without getting tags I don't need or how to handle cases where there are no tags.

I want to take the information in the title tag and turn it into a string that I can use.

Upvotes: 2

Views: 4286

Answers (1)

Mike de Groot
Mike de Groot

Reputation: 36

String fileDataString = Files.readAllLines(Paths.get(fileName), Charset.forName("UTF-8")).stream().collect(Collectors.joining("\n"));

String title = StringUtils.substringBetween(fileDataString, "<title>", "</title>"));

This should work to get the text between < title > and < /title >

EDIT: Thank you BlackPearl for the Stream<String>.collect(Collectors.joining("\n")); suggestion

Upvotes: 2

Related Questions