Jack Pilowsky
Jack Pilowsky

Reputation: 2303

Regex Base64 image with headers

I'm having an issue with base64 images that are not converting correctly sometimes. I need a way to test if the image is in correct base64 format before converting it so I can try to look further into the problem. I have found some regex formulas online, but I think they only expect the string without the headers. I have the string with the headers. I tried to add the headers, but it keeps breaking.

The original:

^([A-Za-z0-9+/]{4})*([A-Za-z0-9+/]{4}|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)$

The I added the headers but it doesn't work:

^([data:image/png;base64,][A-Za-z0-9+/]{4})*([A-Za-z0-9+/]{4}|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)$

Thank you

Upvotes: 1

Views: 2731

Answers (2)

Regular Jo
Regular Jo

Reputation: 5500

You may notice in the original regex the use of [square brackets], these create character sets matching any character within so [data:image/png;base64,] will match d,a,t,a,....,6,4,,. Instead, you may want to create a non-capturing group because I think you're trying to make the header optional, like this (?:data:image/png;base64,)?

^((?:data:image/png;base64,)?[A-Za-z0-9+/]{4})*([A-Za-z0-9+/]{4}|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)$

^                                 # Anchors to the beginning to the string.
(                                 # Opens CG1
 (?:data:image/png;base64,        # Opens NCG1
                                    # Literal data:image/png;base64,
 )?                               # Closes NCG1
                                    # ? repeats zero or one times
 [A-Za-z0-9+/]                    # Character class (any of the characters within)
                                    # Anything between A and Z
                                    # Anything between a and z
                                    # Anything between 0 and 9
                                    # Any of: +/
 {4}                              # Repeats 4 times.
)*                                # Closes CG1
                                    # * repeats zero or more times
(                                 # Opens CG2
 [A-Za-z0-9+/]                    # Character class (any of the characters within)
                                    # Anything between A and Z
                                    # Anything between a and z
                                    # Anything between 0 and 9
                                    # Any of: +/
 {4}                              # Repeats 4 times.
 |                                # Alt (CG2)
 [A-Za-z0-9+/]                    # Character class (any of the characters within)
                                    # Anything between A and Z
                                    # Anything between a and z
                                    # Anything between 0 and 9
                                    # Any of: +/
 {3}                              # Repeats 3 times.
 =                                # Literal =
 |                                # Alt (CG2)
 [A-Za-z0-9+/]                    # Character class (any of the characters within)
                                    # Anything between A and Z
                                    # Anything between a and z
                                    # Anything between 0 and 9
                                    # Any of: +/
 {2}                              # Repeats 2 times.
 ==                               # Literal ==
)                                 # Closes CG2
$                                 # Anchors to the end to the string.

If, however, you want to require the headers, you can remove the non-capturing group and the ? quantifier altogether.

^(data:image/png;base64,[A-Za-z0-9+/]{4})*([A-Za-z0-9+/]{4}|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)$

Upvotes: 3

Mofi
Mofi

Reputation: 49086

The regular expression

^([A-Za-z0-9+/]{4})*([A-Za-z0-9+/]{4}|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)$

What does all those characters mean:

^ ... find a string which starts at beginning of a line or string buffer.

( ... ) ... define a marking group for back referencing the string found by the expression inside the parentheses or for applying a multiplier like used here. Grouping an expression just for applying a multiplier is usually better than with a non marking group, i.e. with (?: ... ) where the question mark and the colon immediately after opening parenthesis makes the group a non marking group.

[ ... ] ... define a positive class of characters which means that any of the characters within the square brackets should be found once for a positive match. [^ ... ] would be a negative character class definition which means any character except one of the characters in the square brackets should be found.

[A-Za-z0-9+/] ... a character being either an upper case or a lower case letter from ASCII table or a digit or the plus sign or a slash.

{4} ... is a multiplier and means previous expression or character exactly four times.

* ... is also a multiplier and means previous expression or character 0 or more times.

| ... means OR.

$ ... means end of line without matching line terminator or end of string buffer.

So this expression means:

  1. Find a string which starts at beginning of a line or the string buffer,
  2. consisting of 0 or more substrings with exactly 4 characters each consisting itself of letters, digits, plus signs, or slash characters,
  3. and last substring at end of line or string buffer is either
    • also string consisting of 4 letters, digits, plus signs, or slashes characters,
    • OR a string consisting of just 3 letters, digits, plus signs, or slashes and an equal sign as fourth character,
    • OR a string consisting of just 2 letters, digits, plus signs, or slashes and two equal signs as third and fourth character.

To allow additionally at beginning of line or string buffer optionally a header string, the expression should be modified to:

^(?:data:image/png;base64,)?(?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/]{4}|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)$

The question mark after the non marking group(?:data:image/png;base64,) means here the previous expression (just a fixed string) zero or one times.

As you can see I changed also the 2 marking groups into 2 non marking groups by inserting ?: after the opening parentheses.

Upvotes: 2

Related Questions