User
User

Reputation: 65951

What punctuation characters are necessary for a city field?

I'm considering a regex to restrict punctuation in city names (worldwide). What would be a fairly inclusive whitelist of these?

I'm thinking:

 (space)
. period
- hyphen
' apostrophe

Also thinking maybe comma or slash but I don't have any examples. Are there others?

Upvotes: 7

Views: 8219

Answers (2)

user5203006
user5203006

Reputation: 31

USPS standard address formatting calls for stripping all special characters except 'necessary' hyphens and dashes used in the primary and/or secondary street address lines and hyphens in the ZIP.

So if an address is:

John O'Toole
456 N 4-1/2 St
San José, CA 99999-4545

The post office prefers envelopes be labeled:

John O Toole
456 N 4 1/2 St
San Jose CA 9999-4545

Upvotes: 2

heptadecagram
heptadecagram

Reputation: 898

This is the most inclusive whitelist of punctuation to be found in city names. The ASCII apostrophe codepoint may not be the one used when someone is entering an apostrophe on their keyboard.

If you've discerned the encoding of the submitted text, you should be able to see if it falls under the Punctuation block:

/\p{InGeneral_Punctuation}/

If you are limiting yourself to Latin-Extended, just use those:

/\p{InLatin_Extended-A}/

Also, ask yourself: What are the consequences of someone putting a funny character into my city name? Is that worse than the consequences of someone not being able to enter their correct address, if I exclude too much?

Upvotes: 2

Related Questions