user1127643
user1127643

Reputation: 169

Best way to extract specific paragraph from file data

Hi i am looking to find out the best way to extract specific paragraph from file using java.

From the following data i need to extract data from "D & A" to Testing1- End and from 2nd para "D & A" to Testing2- End

Please guid me best way to get this values. Thanks
//File Data (Eg : )

Testingdata Testingdata Testingdata Testingdata Testingdata Testingdata Testingdata Testingdata
Testingdata Testingdata Testingdata Testingdata Testingdata Testingdata Testingdata Testingdata
Testingdata Testingdata Testingdata

D and A Testing1 Testing1 Testing1 Testing1 Testing1 Testing1 Testing1 Testing1 Testing1 Testing1 Testing1 Testing1 Testing1 Testing1 Testing1 Testing1 Testing1 Testing1 Testing1 Testing1 Testing1Testing1 Testing1 Testing1 Testing1 Testing1 Testing1 Testing1- End

                                                Date 11/30/11           Page    2

D and A Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Date 11/30/11 Page 3 D and A Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 -End

Upvotes: 2

Views: 2569

Answers (2)

Nishant
Nishant

Reputation: 55856

for an input like this

Testingdata Testingdata Testingdata Testingdata Testingdata Testingdata Testingdata Testingdata Testingdata Testingdata Testingdata Testingdata Testingdata Testingdata Testingdata Testingdata Testingdata Testingdata Testingdata

D and A
Testing1 Testing1 Testing1 Testing1 Testing1 Testing1 Testing1 Testing1 Testing1 Testing1 Testing1 Testing1 Testing1 Testing1 Testing1 Testing1 Testing1 Testing1 Testing1 Testing1 Testing1Testing1 Testing1 Testing1 Testing1 Testing1 Testing1 Testing1- End

                                              Date 11/30/11           Page    2

D and A
Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2

                                              Date 11/30/11           Page    3

D and A
Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2- End

The following Regex will help you out

    String input="";
    BufferedReader br = new BufferedReader(new FileReader("path/to/text/file")); //file path will be something like "D:/test1.txt" or "/home/naishe/test1.txt"
    String line;
    while((line = br.readLine()) != null) {
        input += line+"\n";
    }

    Pattern p = Pattern.compile("(D and A\\s).*?(Testing(1|2)\\- End)");
    Matcher m = p.matcher(input);
    while(m.find()){
        System.out.println("MATCHED:\n" + m.group());
    }

gives

MATCHED:
D and A
Testing1 Testing1 Testing1 Testing1 Testing1 Testing1 Testing1 Testing1 Testing1 Testing1 Testing1 Testing1 Testing1 Testing1 Testing1 Testing1 Testing1 Testing1 Testing1 Testing1 Testing1Testing1 Testing1 Testing1 Testing1 Testing1 Testing1 Testing1- End

MATCHED:
D and A
Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2 Testing2- End

Upvotes: 1

Tjen Wellens
Tjen Wellens

Reputation: 165

I would read in the file line by line, something like this tutorial.

You can then check if the line contains a certain string.

boolean readFollowingLines = false;
ArayList<String> paragraph=new ArayList<String>();
if( string.indexOf("1- End") > 0 ) // maybe >= 0, not shure
    readFollowingLines = false;
if (readFollowingLines)
   paragraph.add(string);
if( string.indexOf("D and A") > 0 ) // maybe >= 0, not shure
   readFollowingLines = true;

If you want more then one paragraph you need to extend this a little. Anyway, I'd probably do it something like this

Upvotes: 1

Related Questions