Reputation: 444
I am trying to create a regex for getting legislation references in pt-br like:
ps.: lei (law), decreto (decree) and cpc (civil procedure code)
My current attempt is:
(?<LEGISLACAO>(art(\.|igos?)\s[\d\.º]+(,\s*?(caput|§),\s*?)?\s+?d[oa]\s+?)*?((lei(\s(estadual|nacional|federal))??|decreto|N?CPC)\s*?(n[º\.])*?\s*?[\d\.\/º]+)(\s*?de\s*?\d{1,2}\s*?de\s*(janeiro|fevereiro|março|abril|maio|junho|julho|agosto|setembro|outubro|novembro|dezembro)\s*?de\s*?\d{2,4})?)
Regex101: https://regex101.com/r/69ggnm/1
But this regex still have some flaws and is capturing some undesired strings like:
And is also getting the "period" in the end of some citations like:
And it is not getting these ones:
How could be a regex to avoid those problems and still getting the correct results?
Upvotes: 1
Views: 86
Reputation: 825
Here's a suggestion tested with grep -P
.
PATT1='(lei|decreto)(( estadual| federal)? nº.?)? \d+.\d{3}(/\d{4}|/\d{2})?'
PATT1='(lei|decreto) ((estadual |federal )?n(.|º.?) )?\d+.\d{3}(/\d{4}|/\d{2})?'
PATT1='(lei|decreto) (estadual |federal )?(n\. |nº\.? )?\d+(\.\d+)?(/\d+)?'
PATT2='lei \d+ de \d+ de (janeiro|fevereiro|março|abril|maio|junho|julho|agosto|setembro|outubro|novembro|dezembro) de \d{4}'
PATT3='art(igo |\.)\d+, § ?\d+º,? do CPC'
PATT4='art\. \d+\.\d+ do CPC/\d{4}'
If "INPUTFILE" contains the following:
1 Lei 11.738/2008
2 Lei nº 9.394/96
3 Lei Estadual nº 6.834
4 Lei estadual 5.539/09
5 Lei 5.539/2009
6 LEI FEDERAL Nº 11.738/2008.
7 lei nº. 1.060/50
8 Lei n. 11.738/2008
9 Lei n. 94/1947
10 Lei 1614 de 21 de janeiro de 1990
11 Decreto 30.825/2002
12 art. 1.039 do CPC/2015
13 art.334, § 5º do CPC
14 artigo 85, §11º, do CPC
... then grep -P -o -i -e "\b($PATT1|$PATT2|$PATT3|$PATT4)\b" "INPUTFILE"
seems to match every target expressions.
Would that meet your needs?
Edited "PATT1" in order to capture "Lei n. 11.738/2008"
Edited "PATT1" in order to capture "Lei estadual 5.539/09" and "Lei n. 94/1947"
Upvotes: 1