Reputation: 591
I'm trying to create a program, which reads CSV files from a directory, using a regex it parses each line of the file and displays the lines after matching the regex pattern. For instance if this is the first line of my csv file
1997,Ford,E350,"ac, abs, moon",3000.00
my output should be
1997 Ford E350 ac, abs, moon 3000.00
I don't want to use any existing CSV libraries. I'm not good at regex, I've used a regex I found on net but its not working in my program This is my source code, I'll be grateful if any one tells me where and what I"ve to modify in order to make my code work. Pls explain me.
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.nio.CharBuffer;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.charset.Charset;
import java.nio.charset.CharsetDecoder;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class RegexParser {
private static Charset charset = Charset.forName("UTF-8");
private static CharsetDecoder decoder = charset.newDecoder();
String pattern = "\"([^\"]*)\"|(?<=,|^)([^,]*)(?=,|$)";
void regexparser( CharBuffer cb)
{
Pattern linePattern = Pattern.compile(".*\r?\n");
Pattern csvpat = Pattern.compile(pattern);
Matcher lm = linePattern.matcher(cb);
Matcher pm = null;
while(lm.find())
{
CharSequence cs = lm.group();
if (pm==null)
pm = csvpat.matcher(cs);
else
pm.reset(cs);
if(pm.find())
{
System.out.println( cs);
}
if (lm.end() == cb.limit())
break;
}
}
public static void main(String[] args) throws IOException {
RegexParser rp = new RegexParser();
String folder = "Desktop/sample";
File dir = new File(folder);
File[] files = dir.listFiles();
for( File entry: files)
{
FileInputStream fin = new FileInputStream(entry);
FileChannel channel = fin.getChannel();
int cs = (int) channel.size();
MappedByteBuffer mbb = channel.map(FileChannel.MapMode.READ_ONLY, 0, cs);
CharBuffer cb = decoder.decode(mbb);
rp.regexparser(cb);
fin.close();
}
}
}
This is my input file
Year,Make,Model,Description,Price
1997,Ford,E350,"ac, abs, moon",3000.00
1999,Chevy,"Venture ""Extended Edition""","",4900.00
1999,Chevy,"Venture ""Extended Edition, Very Large""","",5000.00
1996,Jeep,Grand Cherokee,"MUST SELL!
air, moon roof, loaded",4799.00
I'm getting the same as output where is the problem in my code? why doesn't my regex have any impact on the code?
Upvotes: 0
Views: 4782
Reputation: 591
Anyway I've found the fix myself, thanks guys for your suggestion and help.
This was my initial code
if(pm.find()
System.out.println( cs);
Now changed this to
while(pm.find()
{
CharSequence css = pm.group();
//print css
}
Also I used a different Regex. I'm getting the desired output now.
Upvotes: 1
Reputation: 77971
Take the advice offered and do not use regular expressions to parse a CSV file. The format is deceptively complicated in the way it can be used.
The following answer contains links to wikipedia and the RFC describing the CSV file format:
Upvotes: 0
Reputation: 2583
Using regexp seems "fancy", but with CSV files (at least in my opinion) is not worth it. For my parsing I use http://commons.apache.org/csv/. It has never let me down. :)
Upvotes: 2
Reputation: 2984
You can try this : [ \t]*+"[^"\r\n]*+"[ \t]*+|[^,\r\n]*+
with this code :
try {
Pattern regex = Pattern.compile("[ \t]*+\"[^\"\r\n]*+\"[ \t]*+|[^,\r\n]*+", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE | Pattern.MULTILINE);
Matcher matcher = regex.matcher(subjectString);
while (matcher.find()) {
// Do actions
}
} catch (PatternSyntaxException ex) {
// Take care of errors
}
But yeah, if it's not a very critical demand do try to use something that already working : )
Upvotes: 0