Reputation: 7603
I am processing a text file which contains up to a thousand lines. There are multiple headers and footers in one text file. So I don't need to process the line which contains @h and @f. It tells me the beginning and end of a transaction (Database transaction, I will save those records to DB in one transaction).
A sample record is below. Though the line reaches up to a thousand lines and the columns are up to 40 columns. From each line I am only looking for a specific data i.e (e.g i need to get a name from postion 8 to 30, year from position 60 to 67 and the likes). This position might be next a space or between strings. So I don't want to put the data of each line in to buffer/memory to process it because, I am only interested on few of them. Does CSV file allows to get a data from a specific position in a line? What should I use to get a better performance (to process the data as quick as possible without taking much memory.)? I am using Java
@h Header
@074VH01MATT TARA A5119812073921 RONG HI DE BET IA76200 201108222 0500 *
@074VH01KAYT DJ A5119812073921 RONG DED CR BET IA71200 201108222 0500 *
@f Footer
@h Header
@074VH01MATT TARA A5119812073921 RONG HI DE BET IA76200 201108222 0500 *
@074VH01KAYT DJ A5119812073921 RONG DED CR BET IA71200 201108222 0500 *
@f Footer
Upvotes: 2
Views: 34528
Reputation: 760
Here is my solution:
import java.io.*;
class ReadAFileLineByLine
{
public static void main(String args[])
{
try{
FileInputStream fstream = new FileInputStream("textfile.txt");
BufferedReader br = new BufferedReader(new InputStreamReader(fstream));
String strLine;
//Loop through and check if a header or footer line, if not
//equate a substring to a temp variable and print it....
while ((strLine = br.readLine()) != null) {
if (!(strLine.charAt(1) == "h" || strLine.charAt(1) == "f"))
String tempName = strLine.substring(8,31);
System.out.println(tempName);
}
//Close the input stream
in.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
Is something like this what you're looking for?
Upvotes: 5
Reputation: 9971
Use a BufferedReader so it doesn't hold everything in memory constructed from an InputStreamReader so you can specify the character set (as the JavaDoc for FileReader tells to do) - my example below uses UTF-8 assuming the file is in the same encoding.
import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.InputStreamReader;
public class StringData {
public static void main(String[] args) throws Exception {
BufferedReader br = null;
try {
// change this value
FileInputStream fis = new FileInputStream("/path/to/StringData.txt");
br = new BufferedReader(new InputStreamReader(fis, "UTF-8"));
String sCurrentLine;
while ((sCurrentLine = br.readLine()) != null) {
processLine(sCurrentLine);
}
} finally {
if (br != null) br.close();
}
}
public static void processLine(String line) {
// skip header & footer
if (line.startsWith("@h Header") || line.startsWith("@f Footer")) return;
String name = line.substring(8, 22);
String year = line.substring(63, 67);
System.out.println("Name [" + name + "]\t Year [" + year +"]");
}
}
Output
Name [MATT TARA ] Year [2011]
Name [KAYT DJ ] Year [2011]
Upvotes: 4
Reputation: 3158
Don't worry about memory; you can put the whole file in one char array without anybody noticing. CSV files are a pain and won't do anything for you. Just read each row into a buffer--a String, or char or byte array--and grab from it what you need; the fixed positioning makes it easy.
In general, there's a tradeoff between memory and time. I've found big buffers, say 100Kb to over 1Mb as opposed to, say, 10Kb, can speed you up 5 to 10 times. (Test it yourself with various sizes if it matters. If I understand you right, you're talking about a 40Kb, so no need for a buffer bigger than that. (If it's 40 Mega b then do the tests. Even a 40Mb array won't hurt you, but now you are starting to waste memory.)) Just be sure to close the file and release references to the file class(es) before going on to do other work so your buffers etc. are not a memory leak.
Upvotes: 0
Reputation: 715
I don't think CSV is a must, how are you reading the file, line by line or all at once? I would go with line by line, that way, reading each line is not costly in memory (only one line at a time). You can use a regex on the line and take only the groups you need(with Pattern and Matcher) to help extract exactly what you need.
Upvotes: 1