Charles
Charles

Reputation: 1

Regex for negative number with leading zeros and words with apastrophe

hey I need a regex that removes the leadings zeros. right now I am using this code . it does work it just doesn't keep the negative symbol.

String regex = "^+(?!$)";
String numbers = txaTexte.getText().replaceAll(regex, ")

after that I split numbers so it puts the numbers in a array.

input :
-0005
0003
-87

output :
-5
3
-87

I was also wondering what regex I could use to get this. the words before the arrow are input and after is the output the text is in french. And right now I am using this it works but not with the apostrophe.

String [] tab = txaTexte.getText().split("(?:(?<![a-zA-Z])'|'(?![a-zA-Z])|[^a-zA-Z'])+")

Un beau JOUR. —> Un/beau/JOUR

La boîte crânienne —> La/boîte/crânienne

C’était mieux aujourd’hui —> C’/était/mieux/aujourd’hui

qu’autrefois —> qu’/autrefois

D’hier jusqu’à demain! —> D’/hier/jusqu’/à/demain

Dans mon sous-sol—> Dans/mon/sous-sol

Upvotes: 0

Views: 438

Answers (3)

Thomas
Thomas

Reputation: 88727

If you want to keep -000, 000, +0000 etc. as just 0, try this regex:

`^[-+]?0*(0)$|^([-+])?0*(\d+)$`

Break down:

  • ^...$ means the entire string should match (^ is the start of the string, $ is the end)
  • ...|... is an alternative
  • [-+] is a character class that contains only the plus and minus characters. Note that - has a special meaning ("range") in character classes if it's not the first or last character
  • (...) is a capturing group which can be referenced in the replacement string by $number where number is the 1-based and 1-digit position of the group within the regex (the first group to start is no. 1 etc.)
  • ?, * and + are quantifiers when used outside character classes meaning "0 or 1 occurence" (?), "any number of occurences, including none" (*) and "at least one occurence" (+)
  • ^[-+]?0*(0)$ thus means: the entire string must be an optional sign, followed by any number of zeros and ending with a single zero which is captured as group 1.
  • alternatively ^([-+])?0*(\d+)$ means the entire string must be an optional sign which is captured as group 2, followed by any number of zeros and ending in at least one digit which is captured as group 3.

This regex can then be used with String.replaceAll(regex, "$1$2$3") in order to keep only the single 0 from group 1 or the optional sign and the number without leading zeros from groups 2 and 3. Any empty groups will result empty strings, that's why this works.

However, regular expressions can be slow, especially if you have to process a lot of strings.

One thing to improve this would be to compile the pattern only once:

//compile the pattern once and reuse it
Pattern p = Pattern.compile("^[+-]?0*(0)$|^([+-])?0*(\\d+)$");

//build a matcher from the pattern and the input string, and do the replacement
String number = p.matcher(txaTexte.getText()).replaceAll("$1$2$3");

If you're working on a large number of strings (> 10000) you might want to use some specialized plain parsing without regex. Consider something like this, which on my machine is about 10x faster than the regex approach with reused pattern:

public static String stripLeadingZeros(String s) {
    //nothing to do, return the string as is
    if( s == null || s.isEmpty() ) {
        return s;
    }
    
    char[] chars = s.toCharArray();     
    int usedChars = 0;
    
    //check if the first character is the sign
    boolean hasSign = false;
    if(chars[0] == '-' || chars[0] == '+') {
      hasSign = true;
      usedChars++;
      
      //special case: just a sign
      if(chars.length == 1) {
          return s;
      }
    }
    
    //process the rest of the characters
    boolean stripZeros = true;
    for( int i = usedChars; i < chars.length; i++) {
        //not a digit, this isn't a simple integer, stop processing and keep the original string
        if( chars[i] < '0' || chars[i] > '9') {
            return s;
        }
        
        //are we still in zero-stripping mode
        if( stripZeros) {
            if( chars[i] == '0') {
                continue; //check next char
            }
            
            //we've found a non-zero char, keep it and end zero-stripping mode
            if(chars[i] >= '1' && chars[i] <= '9') {
                stripZeros = false;
            }
        }
        
        //since we are ignoring leading zeros, we just move all digits of the actual number to the left
        chars[usedChars++] = chars[i];                  
    }
    
    //handle special case of number 0 (with optional sign)
    if( usedChars == (hasSign ? 1 : 0)) {
        chars[0] = '0';
        usedChars = 1;
    }
    
    return new String(chars,0, usedChars);
}

Upvotes: 0

WJS
WJS

Reputation: 40062

Here is one way. This preserves the sign.

  • capture the optional sign.
  • check for 0 or more leading zeros
  • followed by 1 or more digits.
String regex = "^([+-])?0*(\\d+)";
String [] data = {"-1415", "+2924", "-0000123", "000322", "+000023"};
for (String num : data) {
    String after = num.replaceAll(regex, "$1$2");
    System.out.printf("%8s --> %s%n", num , after);
}

prints

   -1415 --> -1415
   +2924 --> +2924
-0000123 --> -123
  000322 --> 322
 +000023 --> +23

Upvotes: 2

The fourth bird
The fourth bird

Reputation: 163427

You might capture an optional hyphen, then match 1+ more times a zero and 1 capture 1 or more digits in group 2 starting with a digit 1-9

^(-?)0+([1-9]\d*)$
  • ^ Start of string
  • (-?) Capture group 1, match optional hyphen
  • 0+ Match 0+ zeroes
  • ([1-9]\d*) Capture group 2, match 1+ digits starting with a digit 1-9
  • $ End of string

See a regex demo.

In the replacement use group 1 and group 2.

String regex = "^(-?)0+([1-9]\\d*)$";
String text = "-0005";
String numbers = txaTexte.getText().replaceAll(regex, "$1$2");

Upvotes: 2

Related Questions