celsowm
celsowm

Reputation: 444

Problems trying to create a regex for getting legislation references

I am trying to create a regex for getting legislation references in pt-br like:

ps.: lei (law), decreto (decree) and cpc (civil procedure code)

My current attempt is:

(?<LEGISLACAO>(art(\.|igos?)\s[\d\.º]+(,\s*?(caput|§),\s*?)?\s+?d[oa]\s+?)*?((lei(\s(estadual|nacional|federal))??|decreto|N?CPC)\s*?(n[º\.])*?\s*?[\d\.\/º]+)(\s*?de\s*?\d{1,2}\s*?de\s*(janeiro|fevereiro|março|abril|maio|junho|julho|agosto|setembro|outubro|novembro|dezembro)\s*?de\s*?\d{2,4})?)

Regex101: https://regex101.com/r/69ggnm/1

But this regex still have some flaws and is capturing some undesired strings like:

And is also getting the "period" in the end of some citations like:

And it is not getting these ones:

How could be a regex to avoid those problems and still getting the correct results?

Upvotes: 1

Views: 86

Answers (1)

Grobu
Grobu

Reputation: 825

Here's a suggestion tested with grep -P.

PATT1='(lei|decreto)(( estadual| federal)? nº.?)? \d+.\d{3}(/\d{4}|/\d{2})?'

PATT1='(lei|decreto) ((estadual |federal )?n(.|º.?) )?\d+.\d{3}(/\d{4}|/\d{2})?'

PATT1='(lei|decreto) (estadual |federal )?(n\. |nº\.? )?\d+(\.\d+)?(/\d+)?'
PATT2='lei \d+ de \d+ de (janeiro|fevereiro|março|abril|maio|junho|julho|agosto|setembro|outubro|novembro|dezembro) de \d{4}'
PATT3='art(igo |\.)\d+, § ?\d+º,? do CPC'
PATT4='art\. \d+\.\d+ do CPC/\d{4}'

If "INPUTFILE" contains the following:

 1  Lei 11.738/2008
 2  Lei nº 9.394/96
 3  Lei Estadual nº 6.834
 4  Lei estadual 5.539/09
 5  Lei 5.539/2009
 6  LEI FEDERAL Nº 11.738/2008.
 7  lei nº. 1.060/50
 8  Lei n. 11.738/2008
 9  Lei n. 94/1947
10  Lei 1614 de 21 de janeiro de 1990
11  Decreto 30.825/2002
12  art. 1.039 do CPC/2015
13  art.334, § 5º do CPC
14  artigo 85, §11º, do CPC

... then grep -P -o -i -e "\b($PATT1|$PATT2|$PATT3|$PATT4)\b" "INPUTFILE" seems to match every target expressions.

Would that meet your needs?

Update:

Edited "PATT1" in order to capture "Lei n. 11.738/2008"

Edited "PATT1" in order to capture "Lei estadual 5.539/09" and "Lei n. 94/1947"

Upvotes: 1

Related Questions