Usman Riaz
Usman Riaz

Reputation: 3020

Spliting a string on the basis of regex

I have the following text file

"Zanesville,OH"        +39.93830        -82.00830      84ZC  PMNQ
"Zaragoza,Spain"        +41.66670         -1.05000      GWC7  PXB0
"Zurich,Switzerland"        +47.36670         +8.53330      HP9Z  QVT0
"Zwickau,Germany"        +50.70000        +12.50000      J17H  RFH0

Now i want the values in each line. there are many spaces between values. i know that regex can be used to get the values. but i am unable to make one. the code that i am using to read file is this

File file = new File("C:\\Users\\user\\Desktop\\files\\cities.txt");  
          if (file.exists()) {
              BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
              String line = "";
              while ((line = br.readLine())!= null) {
                  String token[] =line.split(" ");

              }
          }

Can anyone tell me how can i get the values??

Upvotes: 0

Views: 214

Answers (3)

ashokramcse
ashokramcse

Reputation: 2861

You can use line.split("\\s+(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)") regex pattern to make your desired output.

Upvotes: 1

Avinash Raj
Avinash Raj

Reputation: 174696

Just split the input according to the below regex,

\\s+(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)

Code:

String s = "\"Zanesville,OH\"        +39.93830        -82.00830      84ZC  PMNQ\n" + 
        "\"Zaragoza,Spain\"        +41.66670         -1.05000      GWC7  PXB0\n" + 
        "\"Zurich,Switzerland\"        +47.36670         +8.53330      HP9Z  QVT0\n" + 
        "\"Zwickau,Germany, United States\"        +50.70000        +12.50000      J17H  RFH0";
String[] tok = s.split("\\s+(?=(?:[^\"]*+\"[^\"]*+\")*+[^\"]*+$)");
System.out.println(Arrays.toString(tok));

Output:

["Zanesville,OH", +39.93830, -82.00830, 84ZC, PMNQ
"Zaragoza,Spain", +41.66670, -1.05000, GWC7, PXB0
"Zurich,Switzerland", +47.36670, +8.53330, HP9Z, QVT0
"Zwickau,Germany, United States", +50.70000, +12.50000, J17H, RFH0]

Upvotes: 4

Joop Eggen
Joop Eggen

Reputation: 109547

A more generic solution for Excel like CSV

This looks like to have been originally tab-separated text, tabs replaced by multiple spaces. The double quotes suggests CSV like from Excel.

As text between double quotes may contain a line break (multiline text), I start of with the entire text.

String encoding =  "Windows-1252"; // English, best would be "UTF-8".
byte[] textAsBytes = Files.readAllBytes(file.toPath());
String text = new String(textAsBytes, encoding);

Excel uses for (Windows) line endings "\r\n". And in multi-line text "\n".

String[] lines = text.split("\r\n");

Splitting on multiple spaces .split(" +") might break inside a quoted field. So I use a pattern. This pattern uses either something quoted, where any internal quote is self-escaped as two quotes. Or a sequence of non-whitespace.

Pattern pattern = Pattern.compile("\"^([^\"]|\"\")*\"|\\S+");
for (String line: lines) {
     List<String> fields = new ArrayList<>();
     Matcher m = pattern.matcher(line);
     while (m.find()) {
         String field = m.group();
         if (fields.startsWith("\"") && field.endsWith("\"") && field.length() >= 2) {
             field = field.substring(1, field.length() - 1); // Strip quotes.
             field = field.replace("\"\"", "\""); // Unescape inner quotes.
         }
         fields.add(field));
     }
     ...
}

Upvotes: 1

Related Questions