Sebastian
Sebastian

Reputation: 281

Get string after string with trailing whitespaces

I currently need to figure out how to use regex and came to a point which i don't seem to figure out: the test strings that are the sources (They actually come from OCR'd PDFs):

string1 = 'Beleg-Nr.:12123-23131'; // no spaces after the colon
string2 = 'Beleg-Nr.:    12121-214331'; // a tab after the colon
string3 = 'Beleg-Nr.:        12-982831'; // a tab and spaces after the colon

I want to get the numbers eplicitly. For that I use this pattern:

pattern = '/(?<=Beleg-Nr\.:[ \t]*)(.*)

This will get me the pure numbers for string1 and string2 but isn't working on string3 (it gives me additional whitespace before the number).

What am I missing here?

Edit: Thanks for all the helpful advises. The software that OCRs on the fly is able to surpress whitespace on its own in regexes. This did the trick. The resulting pattern is:

(?<=Beleg-Nr\.:[\s]*)(.*)

Upvotes: 1

Views: 137

Answers (4)

urzeit
urzeit

Reputation: 2909

Just replace the (.*) with a more restrictive pattern ([^ ]+$ for example). Also note, that the . after Beleg-Nr matches other chars as well.

The $ in my example matches the end of the line and thus ensures, that all characters are being matched.

I'd suggest to match to tabs as well:

pattern = '/(?<=Beleg-Nr\.:[ \t]*)([^ \t]+)$

Upvotes: 0

mishik
mishik

Reputation: 10003

The problem is that [ ]* will match only spaces. You need to use \s which will match any whitespace character (more specifically \s is [\f\n\r\t\v\u00A0\u2028\u2029]) :

/(?<=Beleg-Nr.:\s*)(.*)/

Side note: * is greedy by default, so it will try to match max number of whitespaces possible, so you do not need to use negative [^\s] in your last () group.

Upvotes: 2

jerone
jerone

Reputation: 16871

This works for me:

/(Beleg-Nr.:\s*)(.*)/

http://regexr.com?35rj6

Upvotes: 2

Alma Do
Alma Do

Reputation: 37365

You can use "\s" special symbol to include both space and tabs (so, you will not need combine it into a group via []).

Upvotes: 3

Related Questions