Adam Knights
Adam Knights

Reputation: 2151

Mysql SQL to update URLs that do not have www

I have a million odd rows where most start

'http://www.' or 'https://www.'

but occasionally they start with no 'www.' - this may be correct but the website owner wants consistency throughout the data and thus I need to update the table to always have 'www.'

I'm struggling with the SQL to do this. I tried:

select * from the_million where URL like 'http://[!w]'

But that returns 0 records so I've fallen at the first hurdle of building up the SQL. I guess after I've got the records I want I'll then do a replace.

I'm happy to run this in two goes for each of http and https so no need for anything fancy there.

Upvotes: 1

Views: 66

Answers (3)

Álvaro González
Álvaro González

Reputation: 146430

This condition:

URL like 'http://[!w]'

... is identical to this one:

URL='http://[!w]'

because it doesn't contain any valid wildcard for MySQL LIKE operator. If you check the MySQL manual page you'll see that the only wildcards are % and _.

The W3Schools page where you read that [!charlist] is valid identifies the section as "SQL Wildcards" which is misleading or plain wrong (depending on how benevolent you feel). That's not standard SQL at all. The error messages returned by their "SQL Tryit Editor" suggest that queries run against a Microsoft Access database, thus it's only a (pretty irrelevant) SQL dialect.

My advice:

  1. Avoid W3Schools as reference site. Their info is often wrong and they apparently don't care enough to amend it.
  2. Always use the official manual of whatever DBMS engine you are using.

Last but not least, the good old www prefix is not a standard part of the HTTP protocol URIs (like http://); it's only a naming convention. Preppending it to an arbitrary list of URLs is like adding "1st floor" to all your customer addresses. Make sure your client knows that he's paying money to corrupt his data on purpose. And if he feels generous, you can propose him to replace all https: with http: as well.

Upvotes: 2

Raptor
Raptor

Reputation: 54212

You can try this query:

UPDATE the_million SET url=REPLACE(url, 'http://', 'http://www.')
WHERE url NOT LIKE 'http://www.%' AND url NOT LIKE 'https://www.%'

UPDATE the_million SET url=REPLACE(url, 'https://', 'https://www.')
WHERE url NOT LIKE 'http://www.%' AND url NOT LIKE 'https://www.%'

Search & replace in 2 queries.

Upvotes: 3

Satish Sharma
Satish Sharma

Reputation: 9635

try this

select * from the_million where URL not like 'http://www.%'

Upvotes: 3

Related Questions