Reputation: 1167
I have a text file which contains content scraped from webpages. The text file is structured like this:
|NEWTAB|lkfalskdjlskjdflsj|NEWTAB|lkjsldkjslkdjf|NEWTAB|sdlfkjsldkjf|NEWLINE|lksjlkjsdl|NEWTAB|lkjlkjlkj|NEWTAB|sdkjlkjsld
|NEWLINE| indicates the start of a new line (i.e., a new row in the data) |NEWTAB| indicates the start of a new field within a line (i.e. a new column in the data)
I need to split the text file into fields and lines and store in an array or some other data structure. Content between |NEWLINE| strings may contain actual new lines (i.e. \n), but these don't indicate an actual new row in the data.
I started by reading each character in one by one and looking at sets of 8 consecutive characters to see if they contained |NEWTAB|. My method proved to be unreliable and ugly. I am looking for the best practice on this. Would the best method be to read the whole text file in as a single string, and then use a string split on "|NEWLINE|" and then string splits on the resulting strings using "|NEWTAB|"?
Many thanks!
Upvotes: 2
Views: 2869
Reputation: 7579
You could do something like this:
Scanner scanner = new Scanner(new File("myFile.txt"));
List<List<String>> rows = new ArrayList<List<String>>();
List<String> column = new ArrayList<String>();
while (scanner.hasNext()) {
for (String elem : scanner.nextLine().split("\\|")) {
System.out.println(elem);
if (elem.equals("NEWTAB") || elem.equals(""))
continue;
else if (elem.equals("NEWLINE")) {
rows.add(column);
column = new ArrayList<String>();
} else
column.add(elem);
}
}
Took me a while to write it up, since I don't have IntelliJ or Eclipse on this computer and had to use Emacs.
EDIT: This is a bit more verbose than I like, but it works with |
s that are part of the text:
Scanner scanner = new Scanner(new File("myFile.txt"));
List<List<String>> rows = new ArrayList<List<String>>();
List<String> lines = new ArrayList<String>();
String line = "";
while (scanner.hasNext()) {
line += scanner.nextLine();
int index = 0;
while ((index = line.indexOf("|NEWLINE|")) >= 0) {
lines.add(line.substring(0, index));
line = line.substring(index + 9);
}
}
if (!line.equals(""))
lines.add(line);
for (String l : lines) {
List<String> columns = new ArrayList<String>();
for (String column : l.split("\\|NEWTAB\\|"))
if (!column.equals(""))
columns.add(column);
rows.add(columns);
}
Upvotes: 1
Reputation: 1167
I think that the other answers will work too, but my solution is as follows:
FileReader inputStream = null;
StringBuilder builder = new StringBuilder();
try {
inputStream = new FileReader(args[0]);
int c;
char d;
while ((c = inputStream.read()) != -1) {
d = (char)c;
builder.append(d);
}
}
finally {
if (inputStream != null) {
inputStream.close();
}
}
String myString = builder.toString();
String rows[] = myString.split("\\|NEWLINE\\|");
for (String row : rows) {
String cols[] = row.split("\\|NEWTAB\\|");
/* do something with cols - e.g., store */
}
Upvotes: 1