inherithandle
inherithandle

Reputation: 2664

Java: any idea to parse the text file?

I'm totally new to Java language. It is difficult for me to efficiently manipulate strings although I've learned many things about the String class and fIle I/O. After writing some codes, I found that the split method in the String class is not a panacea. I'd like to parse this text file like

1   (201, <202,203>), (203, <204,208>), (204, <>)
2   (201, <202,203>), (204, <>)
3   (201, <202,203>), (203, <204,208>)
4   (201, <202,203>), (202, <>), (208, <>)
5   (202, <>), (208, <>)
6   (202, <>)

The first column is characters in this text file, not line number. After reading the first line of it, I'd like to receive 1, 201, 202, 203, 203, 204, 208, and 204, as int value sequentially. What String methods it would be a good idea to use? Thank you in advance.


Code (you may not need.)

import java.io.*;

public class IF_Parser
{       
    private FileInputStream fstream;
    private DataInputStream in;
    private BufferedReader br;

    public IF_Parser(String filename) throws IOException
    {
        try
        {
            fstream = new FileInputStream(filename);
            // Get the object of DataInputStream
            in = new DataInputStream(fstream);
            br = new BufferedReader(new InputStreamReader(in));
        }
        catch (Exception e)
        {
            System.err.println("Error: " + e.getMessage());
        }
    }

    public void Parse_given_file() throws IOException
    {
        try
        {
            String      strLine;
            int         line        = 1;
            while ((strLine = br.readLine()) != null)   
            {
                System.out.println("Line " + line);
                int i;
                String[] splits     =   strLine.split("\t");
                // splits[0] : int value, splits[1] : string representation of list of postings.
                String[] postings   =   splits[1].split(" ");

                line++;
            }
        }
        catch (Exception e)
        {
            System.err.println("Error: " + e.getMessage());
        }
    }
}

Upvotes: 1

Views: 107

Answers (3)

christopher
christopher

Reputation: 27356

1   (201, <202,203>), (203, <204,208>), (204, <>)

Remove all of the () and <>. Then use split() to get the individual tokens, and finally parse them as integers.

Example

String input = scanner.readLine(); // your input.

input = input.replaceAll("\\(\\)", "");
input = input.replaceAll("<>", "");

String[] tokens = input.split(",");

int[] values = new int[tokens.length];

for(int x = 0; x < tokens.length; x++)
{
    values[x] = Integer.parseInt(tokens[x]);
}

Upvotes: 0

user2881767
user2881767

Reputation: 300

You can use StringTokenizer class as well. Code is as simple as follows:

import java.util.StringTokenizer;

public class App { public static void main(String[] args) {

    String str = "1   (201, <202,203>), (203, <204,208>), (204, <>)";
    StringTokenizer st = new StringTokenizer( str, " ,()<>" );


    while ( st.hasMoreElements() ) {
        System.out.println( st.nextElement() );
    }

}

}

Output prints:

1 201 202 203 203 204 208 204

Upvotes: 1

npinti
npinti

Reputation: 52205

Since you want to extract the numeric values of each lines, I would recommend you take a look at the Pattern class. A simple piece of code like the one below:

   String str = "1   (201, <202,203>), (203, <204,208>), (204, <>)";
   Pattern p = Pattern.compile("(\\d+)");
   Matcher m = p.matcher(str);
   while(m.find())
   {
       System.out.println(m.group(1));
   }

Will yield all the numeric values in the line:

1
201
202
203
203
204
208
204

Essentially that pattern will look for one or more repetitions of numbers. When it finds them, it will put them in groups which it then later accesses.

Upvotes: 3

Related Questions