Reputation: 21749
id no, no2, list
id1 (3, 5, [t[0][66], y[5][626]])
id2 (3, 5, [t[0][66], y[5][626], z[5][626]])
id2 (3, 5, [t[0][66], y[5][626]])
id3 (32, 54, [t[0][66], y[5][626]])
id4 (3, 541, [t[0][66], y[5][626], u[5][626], y[25][6226]])
id5 (3, 52, [t[0][66], y[5][626]])
id6 (23, 5, [t[0][66], y[5][626]])
How would I go about parsing such text? I tried creating an object from it without much success. List can vary in size. Java code would be great, but any language or pseudo code, or regular language is fine.
Upvotes: 0
Views: 88
Reputation:
Not your language but in Python
import sys, re
def regex(regex, str):
return [s for s in re.split(regex, str) if s]
def parse(fname):
data = []
with open(fname) as f:
data = f.read().splitlines()
header = regex('[, ]+', data[0]);
print header
for line in data[1:]:
fields = [regex('[(),]+', field)[0] # Remove ) ( ,
for field in line.split()]
fields[3] = fields[3][1:] # Remove [
fields[-1] = fields[-1][:-1] # Remove ]
print fields[0], fields[1], fields[2], fields[3:]
parse("file");
Output ('file' contains your text):
$ python parse.py
['id', 'no', 'no2', 'list']
id1 3 5 ['t[0][66]', 'y[5][626]']
id2 3 5 ['t[0][66]', 'y[5][626]', 'z[5][626]']
id2 3 5 ['t[0][66]', 'y[5][626]']
id3 32 54 ['t[0][66]', 'y[5][626]']
id4 3 541 ['t[0][66]', 'y[5][626]', 'u[5][626]', 'y[25][6226]']
id5 3 52 ['t[0][66]', 'y[5][626]']
id6 23 5 ['t[0][66]', 'y[5][626]']
Upvotes: 2
Reputation: 9225
There is really no reason to create a parser by hand as there are multiple parser generators available, JavaCC being the most popular. A skeleton process is.
There are many examples
Upvotes: 0
Reputation: 4899
I've tried to make a regex to extract data but I have no time to finish it.
here's what I have so far: "id(\\d) \\((\\d*), (\\d*),\\s*\\,*\\[(\\,*\\s*(\\D)\\[(\\d*)\\]\\[(\\d*)\\])*.*\\]\\)"
Use an online tester to make it work better...
1st group is the id#, 2nd group the no, 3rd group no2 and you should get the list items afterwards.
Upvotes: 0