Reputation: 35
I need read a pdf and need to extract data from that.
Data format is some thing like that
Pattern 1:
Impuestos indirectos excluidos.
Forma de pago: 60 días F.F Según condiciones generales de contratación.
FIRMA: Juan Rubio FECHA: 28/09/2021
Pattern 2:
Impuestos indirectos excluidos.
Forma de pago: 60 días F.F.
Según condiciones generales de contratación.
FIRMA: Juan Rubio FECHA: 20/09/202
from that I have to find out 60 días F.F.
I tried in this way \W*(Forma de pago):(\\s)\W*
, which is not working
I am very new to regex and java. Please note "Forma de pago" is fixed in each pdf.
In word requirement is : read anything after "Forma de pago:" till "60 días F.F", means 3 elements only after "Forma de pago:"
can any one help on that please
Upvotes: 1
Views: 85
Reputation: 627527
You can use
String regex = "\\bForma\\s+de\\s+pago:\\s*(\\S+\\s+\\S+\\s+\\S+)";
See regex demo. Details:
\bForma
- a whole word Forma
(\b
is a word boundary)\s+
- one or more whitespacesde
- de
string\s+
- one or more whitespacespago:
- a pago:
string\s*
- zero or more whitespaces(\S+\s+\S+\s+\S+)
- Group 1: one or more non-whitespaces and then two occurrences of one or more whitespaces and one or more non-whitespace chars.Upvotes: 1