Reputation: 2664
I'm totally new to Java language. It is difficult for me to efficiently manipulate strings although I've learned many things about the String class and fIle I/O. After writing some codes, I found that the split method in the String class is not a panacea. I'd like to parse this text file like
1 (201, <202,203>), (203, <204,208>), (204, <>)
2 (201, <202,203>), (204, <>)
3 (201, <202,203>), (203, <204,208>)
4 (201, <202,203>), (202, <>), (208, <>)
5 (202, <>), (208, <>)
6 (202, <>)
The first column is characters in this text file, not line number. After reading the first line of it, I'd like to receive 1, 201, 202, 203, 203, 204, 208, and 204, as int value sequentially. What String methods it would be a good idea to use? Thank you in advance.
Code (you may not need.)
import java.io.*;
public class IF_Parser
{
private FileInputStream fstream;
private DataInputStream in;
private BufferedReader br;
public IF_Parser(String filename) throws IOException
{
try
{
fstream = new FileInputStream(filename);
// Get the object of DataInputStream
in = new DataInputStream(fstream);
br = new BufferedReader(new InputStreamReader(in));
}
catch (Exception e)
{
System.err.println("Error: " + e.getMessage());
}
}
public void Parse_given_file() throws IOException
{
try
{
String strLine;
int line = 1;
while ((strLine = br.readLine()) != null)
{
System.out.println("Line " + line);
int i;
String[] splits = strLine.split("\t");
// splits[0] : int value, splits[1] : string representation of list of postings.
String[] postings = splits[1].split(" ");
line++;
}
}
catch (Exception e)
{
System.err.println("Error: " + e.getMessage());
}
}
}
Upvotes: 1
Views: 107
Reputation: 27356
1 (201, <202,203>), (203, <204,208>), (204, <>)
Remove all of the ()
and <>
. Then use split()
to get the individual tokens, and finally parse them as integers
.
Example
String input = scanner.readLine(); // your input.
input = input.replaceAll("\\(\\)", "");
input = input.replaceAll("<>", "");
String[] tokens = input.split(",");
int[] values = new int[tokens.length];
for(int x = 0; x < tokens.length; x++)
{
values[x] = Integer.parseInt(tokens[x]);
}
Upvotes: 0
Reputation: 300
You can use StringTokenizer class as well. Code is as simple as follows:
import java.util.StringTokenizer;
public class App { public static void main(String[] args) {
String str = "1 (201, <202,203>), (203, <204,208>), (204, <>)";
StringTokenizer st = new StringTokenizer( str, " ,()<>" );
while ( st.hasMoreElements() ) {
System.out.println( st.nextElement() );
}
}
}
Output prints:
1 201 202 203 203 204 208 204
Upvotes: 1
Reputation: 52205
Since you want to extract the numeric values of each lines, I would recommend you take a look at the Pattern
class. A simple piece of code like the one below:
String str = "1 (201, <202,203>), (203, <204,208>), (204, <>)";
Pattern p = Pattern.compile("(\\d+)");
Matcher m = p.matcher(str);
while(m.find())
{
System.out.println(m.group(1));
}
Will yield all the numeric values in the line:
1
201
202
203
203
204
208
204
Essentially that pattern will look for one or more repetitions of numbers. When it finds them, it will put them in groups which it then later accesses.
Upvotes: 3