Reputation: 19826
I am currently retrieving some information from a text file (.txt) that contains some paragraphs. When I retrieve the String from the text file I want to split it so that I each paragraph is in a String object.
Here is the text I get from the text file: http://www.carlowweather.com/plaintext.txt
I have tried to split the String using line breaks and return carriage feeds but neither appear to work, see my code below:
int pCount=0;
public void parseData(String data){
String regex = "(\\n)";
String split[] = data.split(regex);
for(int i = 0; i<split.length; i++){
Log.e("e", pCount + " " + split[i]);
pCount ++;
}
}
I have also tried "\r" and various combinations I have found via searching the net but none seem to work on Android with this text file, I'm guessing the file doesn't contain line breaks or carriage returns? But just blank lines?
What is the best way to split the paragraphs into String objects?
Upvotes: 3
Views: 7498
Reputation: 75222
I think the easiest way to do this is with a Scanner.
Scanner sc = new Scanner(new File("donal.txt"), "UTF-8");
sc.useDelimiter("\n[ \t]*\n");
List<String> result = new ArrayList<String>();
int lineCount = 0;
while (sc.hasNext())
{
String line = sc.next();
System.out.printf("%n%d:%n%s%n", ++lineCount, line);
result.add(line);
}
System.out.printf("%n%d paragraphs found.%n", lineCount);
The first and last paragraphs will actually be the header and footer; I don't know what you want to do about those.
For the sake of readability, I'm assuming the line separator is always the Unix-style \n
, but to be safe you should allow for the Windows-style \r\n
and older Mac-style \r
as well. That would make the regex:
"(?:\r\n|[\r\n])[ \t]*(?:\r\n|[\r\n])
Upvotes: 4
Reputation: 405
The below code will tell you where a new paragraph break exists. It will be up to you to deal with it after that. It simply looks for lines with " " only. This is a characteristic of the file you have referred to. I have included the process used to read the file in the code sample below, as you did not specify that in your original question. One thought I had was that you were reading the file line by line and then trying to do the regEx on each line. I would assume that the previous suggestions would work if you read all of the text file into the one String.
Also, you could break the code up below into another function.
try {
BufferedReader in = new BufferedReader(new FileReader("plaintext.txt"));
String inputDataLine;
while ((inputDataLine = in.readLine()) != null) {
if (!(inputDataLine.contentEquals(" "))) {
System.out.println("What you want to do with a paragraph line");
} else {
System.out.println("What you want to do with a paragraph seperator");
}
}
in.close();
} catch (IOException e) {
}
Upvotes: 3
Reputation: 530
I can't try it in Java right now, but it seems that the source file has an empty space at the beginning of each line (including blank ones), and a <cr><lf>
combination to go to the next line.
A standard regexp to match the occurrences of a such blank line, being on the safe side regarding the blank space, is (quotes are for the Java String definition):
"^ *$"
Upvotes: 1
Reputation: 18349
I think the problem is there are several different characters between paragraphs (spaces, new lines and carriage returns). Try this:
int pCount=0;
public void parseData(String data){
String regex = "([ \\t\\r]*\\n[ \\t\\r]*)+"; // Only this line is changed.
String split[] = data.split(regex);
for(int i = 0; i<split.length; i++){
Log.e("e", pCount + " " + split[i]);
pCount ++;
}
}
Upvotes: 2