Reputation: 213
i am trying to read a .txt file in java and create a list of lists as to put every line of that .txt to another list. For every file i tried to do this all were fine but with the facebook_combined.txt.gz file which is at this link it doesnt do it the right way. Example:
if the first line of another .txt file is like this
52 99 45 61 70 45
and the second like this
70 80 65 91
then my code should create the list of lists named lines and lines must be like this:
line=[[52,99,45,61,70,45][70,80,65,91]].
But for the facebook_combinded.txt file if we suppose that its first line is like this 0 10 20 30 40 50
the same code creates the list of lists lines like this:
lines=[[0,1][0,2][0,3][0,4][0,5][0,...]].
The code i use is below:
ArrayList<ArrayList<String>> lines = new ArrayList<ArrayList<String>>();
//read the file
FileInputStream fstream = new FileInputStream("C:\\Users\\facebook_combined.txt");
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
while (true)//while the file was read
{
String line = br.readLine();//split the file into the lines
if (line == null)
{
break;//if there are no more lines left
}
Scanner tokenize = new Scanner(line);// split the lines into tokens and make into an arraylist
ArrayList<String> tokens = new ArrayList<String>();
while (tokenize.hasNext()) //while there are still more
{
tokens.add(tokenize.next());
}
lines.add(tokens);
}
br.close();
Upvotes: 0
Views: 183
Reputation: 1352
I downloaded the dataset and extracted the text file with 7Zip and it looks like your program is working. When you extract the file, the data looks something like this (using Notepad++) . . .
0 1
0 2
0 3
0 4
0 5
0 6
0 7
0 8
...etc...
I opened the file with regular Notepad and the carriage returns are not visible so that may have caused the confusion (that is the data looks like 0 10 20 30 40...
in Notepad)
EDIT: Updated Explanation
In response to OP
You are right for the way that the data look like in notepad++ but the right version is 0 10 20 30
I am not sure that is correct. Beware of Occam's Razor, you are assuming the data should be parsed 0 10 20 30
even though the file is providing very explicit carriage returns. If the file was not supposed to have the carriage returns, it would not have had them. Similarly, it doesn't seem to be an error in formatting of the file as the format is consistently a pair of numbers followed by a carriage return. There is nothing pointing to the data being parsed as 0 10 20 30 40 . . .
The file facebook_combined.txt looks to be a list of edges in a graph where each edge is a friendship between two people.
It looks like you are trying to read the "circles" of friends, where a circle is a list of numbers. If you download the other tar file "facebook.tar" there are a couple of files with the extensions *.circles. Here is a snippet from one of those files.
circle0 71 215 54 61 298 229 81 253 193 97 264 29 132 110 163 259 183 334 245 222
circle1 173
circle2 155 99 327 140 116 147 144 150 270
circle3 51 83 237
circle4 125 344 295 257 55 122 223 59 268 280 84 156 258 236 250 239 69
circle5 23
circle6 337 289 93 17 111 52 137 343 192 35 326 310 214 32 115 321 209 312 41 20
These *.circles files seem to be of the format you are expecting (A list of list of numbers).
Upvotes: 2
Reputation: 11
Well, You just say that actually the .txt file looks like
0 1
0 2
0 3
0 4
0 5
0 6
0 7
0 8
but you need it like
0 10 20 30 40 50
So i think you would need to read all the file, and then replace the carriage returns
Upvotes: 0
Reputation: 11
I think your code is kinda wrong. I dont usually use "Scanner". But maybe you can use .split()
I dont like the "while(true)" loops so i recommend change that to this:
String s;
while ((s = br.readLine()) != null) {
And remove your:
String line = br.readLine();//split the file into the lines
if (line == null)
{
break;//if there are no more lines left
}
then try to use split something like this:
String[] tokenize = line.split(" ");
ArrayList<String> tokens = new ArrayList<String>();
for(String s : tokenize){
tokens.add(s);
}
Upvotes: 0