Reputation: 3020
I have the following text file
"Zanesville,OH" +39.93830 -82.00830 84ZC PMNQ
"Zaragoza,Spain" +41.66670 -1.05000 GWC7 PXB0
"Zurich,Switzerland" +47.36670 +8.53330 HP9Z QVT0
"Zwickau,Germany" +50.70000 +12.50000 J17H RFH0
Now i want the values in each line. there are many spaces between values. i know that regex can be used to get the values. but i am unable to make one. the code that i am using to read file is this
File file = new File("C:\\Users\\user\\Desktop\\files\\cities.txt");
if (file.exists()) {
BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
String line = "";
while ((line = br.readLine())!= null) {
String token[] =line.split(" ");
}
}
Can anyone tell me how can i get the values??
Upvotes: 0
Views: 214
Reputation: 2861
You can use line.split("\\s+(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)")
regex pattern to make your desired output.
Upvotes: 1
Reputation: 174696
Just split the input according to the below regex,
\\s+(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)
Code:
String s = "\"Zanesville,OH\" +39.93830 -82.00830 84ZC PMNQ\n" +
"\"Zaragoza,Spain\" +41.66670 -1.05000 GWC7 PXB0\n" +
"\"Zurich,Switzerland\" +47.36670 +8.53330 HP9Z QVT0\n" +
"\"Zwickau,Germany, United States\" +50.70000 +12.50000 J17H RFH0";
String[] tok = s.split("\\s+(?=(?:[^\"]*+\"[^\"]*+\")*+[^\"]*+$)");
System.out.println(Arrays.toString(tok));
Output:
["Zanesville,OH", +39.93830, -82.00830, 84ZC, PMNQ
"Zaragoza,Spain", +41.66670, -1.05000, GWC7, PXB0
"Zurich,Switzerland", +47.36670, +8.53330, HP9Z, QVT0
"Zwickau,Germany, United States", +50.70000, +12.50000, J17H, RFH0]
Upvotes: 4
Reputation: 109547
A more generic solution for Excel like CSV
This looks like to have been originally tab-separated text, tabs replaced by multiple spaces. The double quotes suggests CSV like from Excel.
As text between double quotes may contain a line break (multiline text), I start of with the entire text.
String encoding = "Windows-1252"; // English, best would be "UTF-8".
byte[] textAsBytes = Files.readAllBytes(file.toPath());
String text = new String(textAsBytes, encoding);
Excel uses for (Windows) line endings "\r\n"
. And in multi-line text "\n"
.
String[] lines = text.split("\r\n");
Splitting on multiple spaces .split(" +")
might break inside a quoted field. So I use a pattern.
This pattern uses either something quoted, where any internal quote is self-escaped as two quotes. Or a sequence of non-whitespace.
Pattern pattern = Pattern.compile("\"^([^\"]|\"\")*\"|\\S+");
for (String line: lines) {
List<String> fields = new ArrayList<>();
Matcher m = pattern.matcher(line);
while (m.find()) {
String field = m.group();
if (fields.startsWith("\"") && field.endsWith("\"") && field.length() >= 2) {
field = field.substring(1, field.length() - 1); // Strip quotes.
field = field.replace("\"\"", "\""); // Unescape inner quotes.
}
fields.add(field));
}
...
}
Upvotes: 1