Reputation: 21749

How to parse such text?

            id   no, no2, list
            id1 (3, 5,  [t[0][66], y[5][626]])
            id2 (3, 5,  [t[0][66], y[5][626], z[5][626]])
            id2 (3, 5,  [t[0][66], y[5][626]])
            id3 (32, 54,  [t[0][66], y[5][626]])
            id4 (3, 541,  [t[0][66], y[5][626], u[5][626], y[25][6226]])
            id5 (3, 52,  [t[0][66], y[5][626]])
            id6 (23, 5,  [t[0][66], y[5][626]])

How would I go about parsing such text? I tried creating an object from it without much success. List can vary in size. Java code would be great, but any language or pseudo code, or regular language is fine.

Upvotes: 0

Answers (3)

user472308

Reputation:

Not your language but in Python

import sys, re

def regex(regex, str):
    return [s for s in re.split(regex, str) if s]

def parse(fname):

    data = []

    with open(fname) as f:
        data = f.read().splitlines()

    header = regex('[, ]+', data[0]);
    print header

    for line in data[1:]:
        fields = [regex('[(),]+', field)[0]     # Remove ) ( ,
                  for field in line.split()]

        fields[3]   = fields[3][1:]             # Remove [
        fields[-1]  = fields[-1][:-1]           # Remove ]

        print fields[0], fields[1], fields[2], fields[3:]

parse("file");

Output ('file' contains your text):

$ python parse.py
['id', 'no', 'no2', 'list']
id1 3 5 ['t[0][66]', 'y[5][626]']
id2 3 5 ['t[0][66]', 'y[5][626]', 'z[5][626]']
id2 3 5 ['t[0][66]', 'y[5][626]']
id3 32 54 ['t[0][66]', 'y[5][626]']
id4 3 541 ['t[0][66]', 'y[5][626]', 'u[5][626]', 'y[25][6226]']
id5 3 52 ['t[0][66]', 'y[5][626]']
id6 23 5 ['t[0][66]', 'y[5][626]']

Upvotes: 2

waTeim

Reputation: 9225

There is really no reason to create a parser by hand as there are multiple parser generators available, JavaCC being the most popular. A skeleton process is.

Define language using BNF
Translate the BNF to the input language the parser generator understands making sure to make it either left recursive or right recursive as appropriate. JavaCC requires right recursion.
Invoke the parser generator to create the parser classes.
Augment the generated sourcecode by inserting/refining the generator source.

There are many examples

Upvotes: 0

StephaneM

Reputation: 4899

I've tried to make a regex to extract data but I have no time to finish it.

here's what I have so far: "id(\\d) \\((\\d*), (\\d*),\\s*\\,*\\[(\\,*\\s*(\\D)\\[(\\d*)\\]\\[(\\d*)\\])*.*\\]\\)"

Use an online tester to make it work better...

1st group is the id#, 2nd group the no, 3rd group no2 and you should get the list items afterwards.

Upvotes: 0

How to parse such text?

Answers (3)

Related Questions