Reputation: 39
I'm new in RegEx and I have a problem. It looks like easy but whatever I tried, it doesn't work.
I have two lines like :
aaa,bbb,111,22.3,2021-01-01 4:4:4.444
ccc,ddd,555,66.7,2021-02-02 8:8:8.888
This regex does what I want : (.+),(.+),(.+),(.+),(.+)
=> 2 matches with 5 groups
*Match 0 :*
group 1 = aaa
group 2 = bbb
...
group 5 = 2021-01-01 4:4:4.444
*Match 1 :*
group 1 = ccc
...
But if I have more than 5 "fields" it will be complicated. How can I have the same result with something like (.+),"n repetitions"(.+)
? Or something else ? I tried with {n}
and *
but it's not the result expected. I also tried some regex from other posts :
All the modifications tested don't match with my first simple regex ( *(.+),(.+),(.+),(.+),(.+)* )
Edit : I'll finally go for a python solution. Thanks you all
Upvotes: 2
Views: 208
Reputation: 25489
An easy way to do this would be to create the regex using str.join()
.
num_cols = 5
re_str = ','.join(['(.+)'] * num_cols)
rexp = re.compile(re_str)
teststr = """aaa,bbb,111,22.3,2021-01-01 4:4:4.444
ccc,ddd,555,66.7,2021-02-02 8:8:8.888"""
re.findall(rexp, teststr)
This gives:
[('aaa', 'bbb', '111', '22.3', '2021-01-01 4:4:4.444'),
('ccc', 'ddd', '555', '66.7', '2021-02-02 8:8:8.888')]
You can change num_cols to make your regex match any number of columns in your csv.
Keep in mind that this approach will not account for quotes in the CSV, which are supposed to indicate that the commas within the quote are not column separators. If you want good, easy CSV parsing, just use the csv
module.
Another caveat is that if your text has more than num_cols
columns, your matched result will merge them so that you end up with num_cols
groups per match. For example, if we have six columns in our teststr
but num_cols = 5
:
teststr = """aaa,bbb,111,22.3,2021-01-01 4:4:4.444,123
ccc,ddd,555,66.7,2021-02-02 8:8:8.888,456"""
the code above gives:
[('aaa,bbb', '111', '22.3', '2021-01-01 4:4:4.444', '123'),
('ccc,ddd', '555', '66.7', '2021-02-02 8:8:8.888', '456')]
Upvotes: 1
Reputation: 23
You could try this:
([^,]+),?
It matches any word not containing a comma followed by a comma as many times as you want
Upvotes: 0