dimcode
dimcode

Reputation: 213

Reading .txt file in java line by line incorrectly

i am trying to read a .txt file in java and create a list of lists as to put every line of that .txt to another list. For every file i tried to do this all were fine but with the facebook_combined.txt.gz file which is at this link it doesnt do it the right way. Example:

if the first line of another .txt file is like this 52 99 45 61 70 45 and the second like this 70 80 65 91 then my code should create the list of lists named lines and lines must be like this:

line=[[52,99,45,61,70,45][70,80,65,91]].

But for the facebook_combinded.txt file if we suppose that its first line is like this 0 10 20 30 40 50 the same code creates the list of lists lines like this:

lines=[[0,1][0,2][0,3][0,4][0,5][0,...]].

The code i use is below:

 ArrayList<ArrayList<String>> lines = new ArrayList<ArrayList<String>>();

//read the file
FileInputStream fstream = new FileInputStream("C:\\Users\\facebook_combined.txt");
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));

while (true)//while the file was read
{
    String line = br.readLine();//split the file into the lines
    if (line == null) 
    {
        break;//if there are no more lines left
    }

    Scanner tokenize = new Scanner(line);// split the lines into tokens and make into an arraylist
    ArrayList<String> tokens = new ArrayList<String>();

    while (tokenize.hasNext()) //while there are still more
    {
        tokens.add(tokenize.next());
    }
    lines.add(tokens);
}
    br.close();

Upvotes: 0

Views: 183

Answers (3)

FriedSaucePots
FriedSaucePots

Reputation: 1352

I downloaded the dataset and extracted the text file with 7Zip and it looks like your program is working. When you extract the file, the data looks something like this (using Notepad++) . . .

0 1
0 2
0 3
0 4
0 5
0 6
0 7
0 8
...etc...

I opened the file with regular Notepad and the carriage returns are not visible so that may have caused the confusion (that is the data looks like 0 10 20 30 40... in Notepad)


EDIT: Updated Explanation

In response to OP

You are right for the way that the data look like in notepad++ but the right version is 0 10 20 30

I am not sure that is correct. Beware of Occam's Razor, you are assuming the data should be parsed 0 10 20 30 even though the file is providing very explicit carriage returns. If the file was not supposed to have the carriage returns, it would not have had them. Similarly, it doesn't seem to be an error in formatting of the file as the format is consistently a pair of numbers followed by a carriage return. There is nothing pointing to the data being parsed as 0 10 20 30 40 . . .

The file facebook_combined.txt looks to be a list of edges in a graph where each edge is a friendship between two people.

It looks like you are trying to read the "circles" of friends, where a circle is a list of numbers. If you download the other tar file "facebook.tar" there are a couple of files with the extensions *.circles. Here is a snippet from one of those files.

circle0 71  215 54  61  298 229 81  253 193 97  264 29  132 110 163 259 183 334 245 222
circle1 173
circle2 155 99  327 140 116 147 144 150 270
circle3 51  83  237
circle4 125 344 295 257 55  122 223 59  268 280 84  156 258 236 250 239 69
circle5 23
circle6 337 289 93  17  111 52  137 343 192 35  326 310 214 32  115 321 209 312 41  20

These *.circles files seem to be of the format you are expecting (A list of list of numbers).

Upvotes: 2

Heriberto Ortega
Heriberto Ortega

Reputation: 11

Well, You just say that actually the .txt file looks like

0 1
0 2
0 3
0 4
0 5
0 6
0 7
0 8

but you need it like

   0 10 20 30 40 50

So i think you would need to read all the file, and then replace the carriage returns

Upvotes: 0

Heriberto Ortega
Heriberto Ortega

Reputation: 11

I think your code is kinda wrong. I dont usually use "Scanner". But maybe you can use .split()

I dont like the "while(true)" loops so i recommend change that to this:

String s;
while ((s = br.readLine()) != null) {

And remove your:

String line = br.readLine();//split the file into the lines
if (line == null) 
{
    break;//if there are no more lines left
}

then try to use split something like this:

String[] tokenize = line.split(" ");
ArrayList<String> tokens = new ArrayList<String>();
for(String s : tokenize){
tokens.add(s);
}

Upvotes: 0

Related Questions