Reputation: 13
I am trying to capture the following only:
The number after .,' can be any digit and can have anything before or after it. So for example, .1 abc, I only want to capture the 1 or abc,2, I only want to capture the 2.
So if we have the following:
10,000
1.1
,1
.2
'3
'100.000
.200,000
'300'000
abc'100,000
abc.4
abc,5
abc'6
abc 7
,8 abc
.9 abc
'10 abc
.11abc
,12abc
I have the following python regex:
((?<![0-9])([.,':’])([0-9]{1,4}))
The problem is that it's capturing '100 in '100.000 and .200 in .200,000 and '300'000 - how can I stop it from capturing this. So it shouldn't capture '100.000 or .200,000 or '300'000 or abc'100,000 and so on.
I use this to test my regex: https://pythex.org/
Why am I doing this? I am converting InDesign files to HTML, and on some of the conversion the footnotes are not working so using RegReplace on SublimeText to find and replace the footnotes with specific HTML.
Just want to make it more clear as someone has commented thats not clear.
I want to capture a digit that has a . , ' before it, for example:
This is a long string with subscript footnote numbers like this.1 Sometimes they have a dot before the footnote number and sometimes they have a comma,2 Then there are times when it has an apostrophe'3 Now the problem with my regex was that it was capturing the numbers after a dot, comma or apostrophe for values like this 30,000 or 20.000 or '10,000. I don't want to capture anything like that except like this'4 or like this.5 or like this ,6
So what I was trying to do with my regex is to look before the dot, comma and apostrophe to see if there was a digit and if there was then I didn't want to capture none of it, e.g. '10,000 or .20.000 or ,15'000
Now mypetlion got the closest but his regex was not capturing the last 3 in the list, let me see what I can with his regex.
Upvotes: 1
Views: 909
Reputation: 147
If I understand you correctly and you only want the next digit after ANY comma, period, or single quote then (([\.,'’])([0-9]))
should do the trick.
If I misunderstand and you have the negative lookbehind for a reason then try this:
((?<![0-9])([\.,'’])([0-9]))
Upvotes: 0
Reputation: 163207
If I am not mistaken, you don't want to capture '100.000 or .200,000 or '300'000 or abc'100,000 but you do want to capture the rest which contains [.,']
followed by one or more digits.
You could match them and then use an alternation |
and capture in a group what you do want to match:
Details
[.,']\d+[.,']\d+
Match one of the characters in the character class, one or more digits and match one of the characters in the character class (the pattern that you don't want to capture)|
Or[.,'](\d+)
Match one of the characters in the character class and capture in a group one or more digits.Your values will be in captured group 1
Upvotes: 1