Keshavram Kuduwa
Keshavram Kuduwa

Reputation: 1060

Is there any way to read text from .fdt/.fdx/.fdxt ftile from java?

I want to count the number of words in .fdt/.fdx/.fdxt file

I converted .fdxt to .html then further parsed it. Its was successful in some cases but not all.

    String html="";

    Scanner sc = new Scanner(new File("/home/de-10/Desktop/1.html"));
    while(sc.hasNextLine()) {
        html+=sc.nextLine();
    }
    sc.close();

    System.out.println(html);

    Document doc = Jsoup.parse(html.toString());
    String data = doc.text();
    System.out.println(data);

    Scanner sc1 = new Scanner(new String(data));
    int wordCount=0;
    while(sc1.hasNext()) {
        sc1.next();
        wordCount++;
    }
    sc1.close();

    System.out.println("");
    System.out.println("**********");
    System.out.println("WordCount: "+wordCount);
    System.out.println("**********");
    System.out.println("");

I'm looking for some optimal solution.

Upvotes: 1

Views: 71

Answers (1)

AYRM1112013
AYRM1112013

Reputation: 327

You said, " It was successful in some cases but not all". So I suggest removing the punctuation from the text before counting.

int wordCount = Jsoup.parse(html).text().replaceAll("\\p{Punct}", "").split("\\s+").length;

Upvotes: 0

Related Questions