Shibaji Sanyal
Shibaji Sanyal

Reputation: 35

regex to identify specific character

I need read a pdf and need to extract data from that.

Data format is some thing like that

Pattern 1:

Impuestos indirectos excluidos. 
Forma de pago: 60 días F.F Según condiciones generales de contratación. 
FIRMA: Juan Rubio FECHA: 28/09/2021

Pattern 2:

Impuestos indirectos excluidos. 
Forma de pago: 60 días F.F. 
 Según condiciones generales de contratación. 
FIRMA: Juan Rubio FECHA: 20/09/202

from that I have to find out 60 días F.F.

I tried in this way \W*(Forma de pago):(\\s)\W* , which is not working

I am very new to regex and java. Please note "Forma de pago" is fixed in each pdf.

In word requirement is : read anything after "Forma de pago:" till "60 días F.F", means 3 elements only after "Forma de pago:"

can any one help on that please

Upvotes: 1

Views: 85

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627527

You can use

String regex = "\\bForma\\s+de\\s+pago:\\s*(\\S+\\s+\\S+\\s+\\S+)";

See regex demo. Details:

  • \bForma - a whole word Forma (\b is a word boundary)
  • \s+ - one or more whitespaces
  • de - de string
  • \s+ - one or more whitespaces
  • pago: - a pago: string
  • \s* - zero or more whitespaces
  • (\S+\s+\S+\s+\S+) - Group 1: one or more non-whitespaces and then two occurrences of one or more whitespaces and one or more non-whitespace chars.

Upvotes: 1

Related Questions