Reputation: 4130
I'm trying to create a regex pattern to match the lines in the following format:
field[bii] = float4:.4f_degree // Galactic Latitude
field[class] = int2 (index) // Browse Object Classification
field[dec] = float8:.4f_degree (key) // Declination
field[name] = char20 (index) // Object Designation
field[dircos1] = float8 // 1st Directional Cosine
I came up with this pattern, which seemed to work, then suddenly seemed NOT to work:
field\[(.*)\] = (float|int|char)([0-9]|[1-9][0-9]).*(:(\.([0-9])))
Here is the code I'm trying to use (edit: provided full method instead of excerpt):
private static Map<String, String> createColumnMap(String filename) {
// create a linked hashmap mapping field names to their column types. Use LHM because I'm picky and
// would prefer to preserve the order
Map<String, String> columnMap = new LinkedHashMap<String, String>();
// define the regex patterns
Pattern columnNamePattern = Pattern.compile(columnNameRegexPattern);
try {
Scanner scanner = new Scanner(new FileInputStream(filename));
while (scanner.hasNextLine()) {
String line = scanner.nextLine();
if (line.indexOf("field[") != -1) {
// get the field name
Matcher fieldNameMatcher = columnNamePattern.matcher(line);
String fieldName = null;
if (fieldNameMatcher.find()) {
fieldName = fieldNameMatcher.group(1);
}
String columnName = null;
String columnType = null;
String columnPrecision = null;
String columnScale = null;
//Pattern columnTypePattern = Pattern.compile(".*(float|int|char)([0-9]|[1-9][0-9])");
Pattern columnTypePattern = Pattern.compile("field\\[(.*)\\] = (float|int|char).*([0-9]|[1-9][0-9]).*(:(\\.([0-9])))");
Matcher columnTypeMatcher = columnTypePattern.matcher(line);
System.out.println(columnTypeMatcher.lookingAt());
if (columnTypeMatcher.lookingAt()) {
System.out.println(fieldName + ": " + columnTypeMatcher.groupCount());
int count = columnTypeMatcher.groupCount();
if (count > 1) {
columnName = columnTypeMatcher.group(1);
columnType = columnTypeMatcher.group(2);
}
if (count > 2) {
columnScale = columnTypeMatcher.group(3);
}
if (count >= 6) {
columnPrecision = columnTypeMatcher.group(6);
}
}
int precision = Integer.parseInt(columnPrecision);
int scale = Integer.parseInt(columnScale);
if (columnType.equals("int")) {
if (precision <= 4) {
columnMap.put(fieldName, "INTEGER");
} else {
columnMap.put(fieldName, "BIGINT");
}
} else if (columnType.equals("float")) {
if (columnPrecision==null) {
columnMap.put(fieldName,"DECIMAL(8,4)");
} else {
columnMap.put(fieldName,"DECIMAL(" + columnPrecision + "," + columnScale + ")");
}
} else {
columnMap.put(fieldName,"VARCHAR("+columnPrecision+")");
}
}
if (line.indexOf("<DATA>") != -1) {
scanner.close();
break;
}
}
scanner.close();
} catch (FileNotFoundException e) {
}
return columnMap;
}
When I get the groupCount from the Matcher object, it says there are 6 groups. However, they aren't matching the text, so I could definitely use some help... can anyone assist?
Upvotes: 0
Views: 540
Reputation: 1643
It's not entirely clear to me what you're after but I came up with the following pattern and it accepts all of your input examples:
field\\[(.*)\\] = (float|int|char)([1-9][0-9]?)?(:\\.([0-9]))?
using this code:
String columnName = null;
String columnType = null;
String columnPrecision = null;
String columnScale = null;
// Pattern columnTypePattern =
// Pattern.compile(".*(float|int|char)([0-9]|[1-9][0-9])");
// field\[(.*)\] = (float|int|char)([0-9]|[1-9][0-9]).*(:(\.([0-9])))
Pattern columnTypePattern = Pattern
.compile("field\\[(.*)\\] = (float|int|char)([1-9][0-9]?)?(:\\.([0-9]))?");
Matcher columnTypeMatcher = columnTypePattern.matcher(line);
boolean match = columnTypeMatcher.lookingAt();
System.out.println("Match: " + match);
if (match) {
int count = columnTypeMatcher.groupCount();
if (count > 1) {
columnName = columnTypeMatcher.group(1);
columnType = columnTypeMatcher.group(2);
}
if (count > 2) {
columnScale = columnTypeMatcher.group(3);
}
if (count > 4) {
columnPrecision = columnTypeMatcher.group(5);
}
System.out.println("Name=" + columnName + "; Type=" + columnType + "; Scale=" + columnScale + "; Precision=" + columnPrecision);
}
I think the problem with your regex was it needed to make the scale and precision optional.
Upvotes: 1
Reputation: 120506
field\[(.*)\] = (float|int|char)([0-9]|[1-9][0-9]).*(:(\.([0-9])))
The .*
is overly broad, and there is a lot of redundancy in ([0-9]|[1-9][0-9])
, and I think the parenthetical group that starts with :
and preceding .*
should be optional.
After removing all the ambiguity, I get
field\[([^\]]*)\] = (float|int|char)(0|[1-9][0-9]+)(?:[^:]*(:(\.([0-9]+))))?
Upvotes: 0