Reputation: 12341
I am trying to come up with a regular expression to match Bitcoin addresses according to these specs:
A Bitcoin address, or simply address, is an identifier of 27-34 alphanumeric characters, beginning with the number 1 or 3 [...]
I figured it would look something like this
/^[13][a-zA-Z0-9]{27,34}/
Thing is, I'm not good with regular expressions and I haven't found a single source to confirm this would not create false negatives.
I've found one online that's ^1[1-9A-Za-z][^OIl]{20,40}
, but I don't even know what the [^OIl]
part means and it doesn't seem to match the 3
a Bitcoin address could start with.
Upvotes: 51
Views: 51729
Reputation: 91
/^[13][a-km-zA-HJ-NP-Z1-9]{25,34}$|^[bB][cC]1[pPqQ][a-zA-Z0-9]{38,58}$/g
It covers all four possible BTC address formats (from modern to legacy):
Taproot address - P2TR
Begins with bc1p
, length from 42 to 62, case insensitive
SegWit address - P2WPKH
Begins with bc1q
, length from 42 to 62, case insensitive
Script address - P2SH
Begins with 3
, length from 26 to 35 (cannot have small el l
, capital eye I
, capital ou O
and zero 0
), case sensitive
Legacy address - P2PKH
Begins with 1
, length from 26 to 35 (cannot have small el l
, capital eye I
, capital ou O
and zero 0
), case sensitive
Examples:
1MbeQFmHo9b69kCfFa6yBr7BQX4NzJFQq9
3EmUH8Uh9EXE7axgyAeBsCc2vdUdKkDqWK
bc1qj89046x7zv6pm4n00qgqp505nvljnfp6xfznyw
bc1p8denc9m4sqe9hluasrvxkkdqgkydrk5ctxre5nkk4qwdvefn0sdsc6eqxe
P.S. Not sure about length, can't even find 26-character legacy wallet, but search engines said that 26 is a min and 62 is a max (to be updated)
Upvotes: 4
Reputation: 748
for mainnet bitcoin
/^([13]{1}[a-km-zA-HJ-NP-Z1-9]{26,33}|bc1[a-z0-9]{39,59})$/
if you don't want to understand the above regex you can skip the detail below
breaking it down
For regular addresses
/[13]{1}/
address will start with 1 or 3, {1} defines that only match one character in square bracket
/[13]{1}[a-km-zA-HJ-NP-Z1-9]/
cannot have l (small el), I (capital eye), O (capital O) and 0 (zero)
/[13]{1}[a-km-zA-HJ-NP-Z1-9]{26,33}/
can be 27 to 34 characters long, remember we already checked the first character to be 1 or 3, so remaining address will be 26 to 33 characters long
For segwit
/bc1/
starts with bc1
/bc1[a-z0-9]/
can only contain lower case letters and numbers
/bc1[a-z0-9]{39,59}/
can be 42 to 62 characters long, we already checked first three characters to be bc1, so remaining address will be 39 to 59 characters long
Upvotes: 2
Reputation: 868
For matching legacy, nested SegWit, and native SegWit addresses:
/^(?:[13]{1}[a-km-zA-HJ-NP-Z1-9]{26,33}|bc1[a-z0-9]{39,59})$/
Source: Regex for Bitcoin Addresses.
Upvotes: -1
Reputation: 2387
I am not into complicated solutions and this regex served the purpose for the most simplest validation, when you just don't want to receive complete nonsense.
\w{25,}
Upvotes: -2
Reputation: 17181
Based on answer of runeks and Erhard Dinhobl I got this that accepts bech32 and legacy:
\b(bc(0([ac-hj-np-z02-9]{39}|[ac-hj-np-z02-9]{59})|1[ac-hj-np-z02-9]{8,87})|[13][a-km-zA-HJ-NP-Z1-9]{25,35})\b
Including testnet address:
\b((bc|tb)(0([ac-hj-np-z02-9]{39}|[ac-hj-np-z02-9]{59})|1[ac-hj-np-z02-9]{8,87})|([13]|[mn2])[a-km-zA-HJ-NP-Z1-9]{25,39})\b
Only testnet:
\b(tb(0([ac-hj-np-z02-9]{39}|[ac-hj-np-z02-9]{59})|1[ac-hj-np-z02-9]{8,87})|[mn2][a-km-zA-HJ-NP-Z1-9]{25,39})\b
Upvotes: 7
Reputation: 1832
^[13][a-km-zA-HJ-NP-Z1-9]{25,34}$
will match a string that starts with either 1
or 3
and, after that, 25 to 34 characters of either a-z, A-Z, or 0-9, excluding l
, I
, O
and 0
(not valid characters in a Bitcoin address).
Upvotes: 64
Reputation: 17
As the OP didn't provide a specific use case (only matching criteria) and I came across this in researching methods to detect BitCoin addresses, wanted to post back and share with the community.
These RegEx provided will find BitCoin addresses either at the start of a line and/or end of the line. My use case was to find BitCoin addresses in the body of an email given the rise of blackmail/sextortion (Reference: https://krebsonsecurity.com/2018/07/sextortion-scam-uses-recipients-hacked-passwords/) - so these weren't effective solutions (as outlined later). The proposed RegEx will catch many FPs in email, due to filenames and other identifiers within URLs. I am not knocking the solutions, as they work for certain use cases, but they simply don't work for mine. One variation caught many spam emails within a short timeframe of passive alerting (examples follow).
Here are my test cases:
--------------------------------------------------------
BitCoin blackmail formats observed (my org and online):
--------------------------------------------------------
BTC Address: 1JHwenDp9A98XdjfYkHKyiE3R99Q72K9X4
BTC Address: 1Unoc4af6gCq3xzdDFmGLpq18jbTW1nZD
BTC Address: 1A8Ad7VbWDqwmRY6nSHtFcTqfW2XioXNmj
BTC Address: 12CZYvgNZ2ze3fGPFzgbSCELBJ6zzp2cWc
BTC Address: 17drmHLZMsCRWz48RchWfrz9Chx1osLe67
Receiving Bitcoin Address: 15LZALXitpbkK6m2QcbeQp6McqMvgeTnY8
Receiving Bitcoin Address: 1MAFzYQhm6msF2Dxo3Nbox7i61XvgQ7og5
--------------------------------------------------------
Other possible BitCoin test cases I added:
--------------------------------------------------------
- What if text comes before and/or after on same line? Or doesn't contain BitCoin/BTC/etc. anywhere (or anywhere close to the address)?
Send BitCoin payments here 1MAFzYQhm6msF2Dxo3Nbox7i61XvgQ7og5
1MAFzYQhm6msF2Dxo3Nbox7i61XvgQ7og5 to keep your secrets safe.
Send payments here 1MAFzYQhm6msF2Dxo3Nbox7i61XvgQ7og5 to keep your secrets safe.
- Standalone address:
1Dvd7Wb72JBTbAcfTrxSJCZZuf4tsT8V72
--------------------------------------------------------
Redacted Body content generating FPs from spam emails:
--------------------------------------------------------
src=3D"https://example.com/blah=3D2159024400&t=3DXWP9YVkAYwkmif9RgKeoPhw2b1zdMnMzXZSGRD_Oxkk"
"cursor:pointer;color:#6A6C6D;-webkit-text-size-blahutm_campaign%253Drdboards%2526e_t%253Dd5c2deeaae5c4a8b8d2bff4d0f87ecdd%2526utm_cont=blah
src=3D"https://example.com/blah/74/328e74997261d5228886aab1a2da6874.jpg"
src=3D"https://example.com/blah-1c779f59948fc5be8a461a4da8d938aa.jpg"
href=3D"https://example.com/blah-0ff3169b28a6e17ae8a369a3161734c1?alert_=id=blah
Some RegEx samples I tested (won't list those I'd knock for greedy globbing with backtraces):
^[13][a-km-zA-HJ-NP-Z1-9]{25,34}$
[13][a-km-zA-HJ-NP-Z1-9]{25,34}$
(Too narrow and misses BitCoin addresses within a paragraph)
(bc1|[13])[a-zA-HJ-NP-Z0-9]{25,39}$
(Still misses text after BTC on same line and triples execution time)
\W[13][a-km-zA-HJ-NP-Z1-9]{25,34}\W
(Too broad and catches URL formats)
The current RegEx I am evaluating which catches all my known/crafted sample cases and eliminates known FPs (specifically avoiding end of sentence period for URL filename FPs):
[13][a-km-zA-HJ-NP-Z1-9]{25,34}\s
One reference point for execution times (shows cost in steps and time): https://regex101.com/
Please feel free to weigh in or provide suggestions on improvements (I am by no means a RegEx master). As I further vet it against email detection of Body content, I will update if other FP cases are observed or more efficient RegEx is derived.
Seth
Upvotes: 0
Reputation: 1816
Based on the description here: https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki I would say the regex for a Bech32 bitcoin address for Version 1 and Version 0 (only for mainnet) is:
\bbc(0([ac-hj-np-z02-9]{39}|[ac-hj-np-z02-9]{59})|1[ac-hj-np-z02-9]{8,87})\b
Here are some other links where I found infos:
Upvotes: 2
Reputation: 363
^(bc1|[13])[a-zA-HJ-NP-Z0-9]{25,39}$
Based on the new address type Bech32
Upvotes: 16
Reputation: 165
^[13][a-km-zA-HJ-NP-Z1-9]{25,34}$
bitcoin address is
O
, uppercase letter I
, lowercase letter l
, and the number 0
are never used to prevent visual ambiguity.Upvotes: 14
Reputation: 318568
[^OIl]
matches any character that's not O, I or l. The problems in your regex are:
$
at the end, so it'd match any string beginning with a BC address.{27,34}
- that should be {26,33}
However, as mentioned in a comment, a regex is not a good way to validate a bitcoin address.
Upvotes: 13