Reputation: 3
I'm writing a program that takes data from rows in a text file. the problem is that its not the best written text file and there is much confusion when trying to write a parser for the file
Here are two such rows, for both I can get the address and latitude and longitude variables but on the second one i cannot get price or size(s). the error i keep getting is a string out of bounds exception of -41 (seriously)
|12091805|,|0|,|DETAILS|,||,||,|Latitude:54.593406, Longitude:-5.934344 <b >Unit 8 Great Northern Mall Great Victoria Street Belfast Down<//b><p><p><p>Price : 150,000<p>Size: 2,411 Sq Feet ()<p>Rent : 50,500 Per Annum<p><p>Text<p><p>|,||,||
|15961081|,|0|,|DETAILS|,||,||,|<p>Latitude:54.593406, Longitude:-5.934344 <b>3-5 Market Street Lurgan BT66</b> </p> <p> </p> <p> </p> <p> Price : £250,000 </p> <p> Size: 0.173 acres (0.07ha) </p> <p> </p> <p> Text </p> <p> </p> <p> Text </p> <p> </p> <p> Text </p> <p> </p> <p> </p>|,||,||
Its a lot longer but I changed the paragraphs just to say text for now.
And no, I cannot re-write the text file. Any pointers would be appreciated
if (s.contains("Price"))
{
int pstart = 0;
int pend = 0;
if (s.contains("<p>Size"))
{
//if has pound symbol
if (s.contains("£"))
{
String[] str = s.split("£");
StringBuilder bs = new StringBuilder();
for (String st : str)
{
bs.append(st);
}
pstart = bs.indexOf("Price") + 8;
pend = bs.indexOf("</p>") - 1;
}
else
{
pstart = s.indexOf("Price") + 8;
pend = s.indexOf("<p>Size");
}
String sp = s.substring(pstart, pend);
String[] spl = sp.split(",");
StringBuilder build = new StringBuilder();
for (String st : spl)
{
build.append(st);
f = build.toString();
}
in = Integer.parseInt(f);
p.setPrice(in);
}
else
{
if (s.contains("£"))
{
String[] str = s.split("£");
StringBuilder bs = new StringBuilder();
for (String st : str)
{
bs.append(st);
}
pstart = bs.indexOf("Price : ");
pend = bs.indexOf("</p>") - 1;
}
else
{
pstart = s.indexOf("Price") + 8;
pend = s.indexOf("<p>Size");
}
String sp = s.substring(pstart, pend);
String[] spl = sp.split(",");
StringBuilder build = new StringBuilder();
for (String st : spl)
{
build.append(st);
f = build.toString();
}
in = Integer.parseInt(f);
p.setPrice(in);
}
}
// if has size property
if (s.contains("Size"))
{
//if in acres
if (s.contains("acres"))
{
int sstart = s.indexOf("Size:") + 6;
int send = s.indexOf("acres") - 1;
String sp = s.substring(sstart, send);
double d = Double.parseDouble(sp);
p.setSized(d);
}
if (s.contains("()"))
{
int sstart = s.indexOf("Size:") + 6;
int send = s.indexOf("Sq") - 2;
String sp = s.substring(sstart, send);
if (sp.contains("-") && sp.contains(","))
{
String[] spl = sp.split("-|,");
StringBuilder str = new StringBuilder();
str.append(spl[0] + spl[1]);
StringBuilder str2 = new StringBuilder(0);
str2.append(spl[2] + spl[3]);
String s1 = str.toString();
int i = Integer.parseInt(s1);
p.setSize(i);
String s2 = str2.toString();
i = Integer.parseInt(s2);
p.setSize2(i);
}
if (sp.contains("-"))
{
String[] spl = sp.split("-");
int one = Integer.parseInt(spl[0]);
p.setSize(one);
int two = Integer.parseInt(spl[1]);
p.setSize2(two);
}
else if (!(sp.contains("-")))
{
if (sp.contains(","))
{
String[] spl = sp.split(",");
StringBuilder build = new StringBuilder();
for (String st : spl)
{
build.append(st);
f = build.toString();
}
in = Integer.parseInt(f);
p.setSize(in);
}
else
{
p.setSize(Integer.parseInt(sp));
}
}
}
}
v.add(p);
p = new Property();
Upvotes: 0
Views: 201
Reputation: 4286
The approach I would take is.
£
for example) to the equivalent text character and filter out HTML markup (<p>
etc)For step 2, something like this is what I'm thinking. So you strip all of the html markup out of the string before splitting it on the field separater (|)
Remove HTML tags from a String
Upvotes: 0
Reputation: 35068
I'd use regular expressions, the following should point you in the right direction:
Pattern pricePattern = Pattern.compile("Price\\s*:\\s*(£)?([0-9,.]+)");
Pattern sqFeetPattern = Pattern.compile("Size\\s*:\\s*([0-9,.]+)\\s*Sq");
Pattern acresPattern = Pattern.compile("Size\\s*:\\s*([0-9,.]+)\\s*acres\\s*\\(([0-9,.]+)ha\\)");
NumberFormat nf = NumberFormat.getNumberInstance();
nf.setGroupingUsed(true);
BufferedReader r = new BufferedReader(inputFileReader);
String line;
while ((line = r.readLine()) != null) {
Matcher m = pricePattern.matcher(line);
if (m.find()) {
int price = nf.parse(m.group(2)).intValue();
System.out.println("Price: " + price);
}
m = sqFeetPattern.matcher(line);
if (m.find()) {
int sqFeet = nf.parse(m.group(1)).intValue();
System.out.println("Sq Feet: " + sqFeet);
}
m = acresPattern.matcher(line);
if (m.find()) {
float acres = nf.parse(m.group(1)).floatValue();
float ha = nf.parse(m.group(2)).floatValue();
System.out.println("Acres: " + acres + " ha: " + ha);
}
}
N.B. inputFileReader
would be defined as a FileReader
or similar to get your file.
Upvotes: 1