Reputation: 979
I have been given a file which has many paragraphs in it. The output I am expecting is that I read one paragraph at a time and perform operations on it.
final String PARAGRAPH_SPLIT_REGEX = "(?m)(?=^\\s{4})";
String currentLine;
final BufferedReader bf = new BufferedReader(new FileReader("filename"));
currentLine = bf.readLine();
final StringBuilder stringBuilder = new StringBuilder();
while(currentLine !=null) {
stringBuilder.append(currentLine);
stringBuilder.append(System.lineSeparator());
currentLine = bf.readLine();
}
String[] paragraph= new String[stringBuilder.length()];
if(stringBuilder!=null) {
final String value = stringBuilder.toString();
paragraph = value.split(PARAGRAPH_SPLIT_REGEX);
}
for (final String s : paragraph) {
System.out.println(s);
}
File (Every paragraph has a space of 2 characters before it, and there is no blank line between paragraphs):
Story
Her companions instrument set estimating sex remarkably solicitude motionless. Property men the why smallest graceful day insisted required. Inquiry justice country old placing sitting any ten age. Looking venture justice in evident in totally he do ability. Be is lose girl long of up give.
"Trifling wondered unpacked ye at he. In household certainty an on tolerably smallness difficult. Many no each like up be is next neat. Put not enjoyment behaviour her supposing. At he pulled object others."
Passage its ten led hearted removal cordial. Preference any astonished unreserved mrs. Prosperous understood middletons in conviction an uncommonly do. Supposing so be resolving breakfast am or perfectly. Is drew am hill from mr. Valley by oh twenty direct me so.
Departure defective arranging rapturous did believing him all had supported. Family months lasted simple set nature vulgar him. "Picture for attempt joy excited ten carried manners talking how. Suspicion neglected he resolving agreement perceived at an."
However, I am not achieving the desired output. The paragraph variable contains only two values
I guess, the regex I am trying to use here is not working. The regex I gathered from here. Splitting text into paragraphs with regex JAVA
I am using java8.
Upvotes: 2
Views: 4057
Reputation: 3658
You can used Scanner
with delimiter, for iterating over text. For example:
Scanner scanner = new Scanner(text).useDelimiter("\n ");
while (scanner.hasNext()) {
String paragraph = scanner.next();
System.out.println("# " + paragraph);
}
The output is:
# Story
# Her companions instrument set estimating sex remarkably solicitude motionless. Property men the why smallest graceful day insisted required. Inquiry justice country old placing sitting any ten age. Looking venture justice in evident in totally he do ability. Be is lose girl long of up give.
# "Trifling wondered unpacked ye at he. In household certainty an on tolerably smallness difficult. Many no each like up be is next neat. Put not enjoyment behaviour her supposing. At he pulled object others."
# Passage its ten led hearted removal cordial. Preference any astonished unreserved mrs. Prosperous understood middletons in conviction an uncommonly do. Supposing so be resolving breakfast am or perfectly. Is drew am hill from mr. Valley by oh twenty direct me so.
# Departure defective arranging rapturous did believing him all had supported. Family months lasted simple set nature vulgar him. "Picture for attempt joy excited ten carried manners talking how. Suspicion neglected he resolving agreement perceived at an."
Upvotes: 2
Reputation:
You could just globally find each indented paragraph, then add to a list.
"(?m)^[^\\S\\r\\n]{2,}\\S.*(?:\\r?\\n|$)(?:^\\S.*(?:\\r?\\n|$))*"
Expanation
(?m) # Multi-line mode ( ^ = begin of line )
^ [^\S\r\n]{2,} # Begin of Paragraph, 2 or more horizontal wsp at BOL
\S .* # Rest of line, must be non-wsp as first letter.
(?: \r? \n | $ )
(?: # Optional, many more lines of this paragraph
^ \S .*
(?: \r? \n | $ )
)*
Upvotes: 0
Reputation: 979
According to Jason's comment, I tried his approach.I think I have the desired outcome, however, I am not pleased with the approach, time and space complexity have increased, I might improvise it later.
currentLine = bf.readLine();
List<List<String>> paragraphs = new LinkedList<>();
int counter = 0;
while(currentLine !=null) {
if(paragraphs.isEmpty()) {
List<String> paragraph = new LinkedList<>();
paragraph.add(currentLine);
paragraph.add(System.lineSeparator());
paragraphs.add(paragraph);
currentLine = bf.readLine();
continue;
}
if(currentLine.startsWith(" ")) {
List<String> paragraph = new LinkedList<>();
paragraph.add(currentLine);
counter = counter + 1;
paragraphs.add(paragraph);
}else {
List<String> continuedParagraph = paragraphs.get(counter);
continuedParagraph.add(currentLine);
}
currentLine = bf.readLine();
}
for (final List<String> story : paragraphs) {
for(final String s : story) {
System.out.println(s);
}
}
Upvotes: 1