Ryan
Ryan

Reputation: 15270

How can I strip out urls from a string in MySQL?

I need to strip out urls from results of a SQL query. All urls look like http:// so no complex regex is necessary. But the urls can be anywhere in the string.

Examples:

alpha beta http://alphabeta.com/abc
gamma http://gammadel.com/def delta
http://epsilo.com/ghi epsilon theta

How can I strip these urls from my results to get the following?

alpha beta
gamma delta
epsilon theta

Notes:

  1. URLs (in my use case) do always start with http://.
  2. Only one URL is found per string
  3. Ideally solutions will not require additional libraries

Upvotes: 2

Views: 1862

Answers (3)

MaKR
MaKR

Reputation: 1892

+1 to Jakub (+10 for functionality, -9 for enjoyability) for his answer. I'm posting this derived abomination of his answer so I don't lose it, and may save hours of someone's time.

Take a part number field of varchar(255) and concat with a description field of varchar(255), trim, strip links, strip new lines, prevent part number from being duplicated in the description, and format it for insert into 2 char(30) fields that are concat...

SELECT SUBSTRING(
    TRIM(CONCAT(
        CONCAT(
            TRIM(REPLACE(REPLACE(REPLACE(REPLACE(
                pi.part_number, CONCAT('https://', SUBSTRING_INDEX(SUBSTRING_INDEX(pi.part_number, 'https://', -1), ' ', 1)), ''
            ), CONCAT('http://', SUBSTRING_INDEX(SUBSTRING_INDEX(pi.part_number, 'http://', -1), ' ', 1)), ''), '\n', ''), '\r', ''))
            , ' '
        ),
        TRIM(REPLACE(
            REPLACE(REPLACE(REPLACE(REPLACE(
                pi.description, CONCAT('https://', SUBSTRING_INDEX(SUBSTRING_INDEX(pi.description, 'https://', -1), ' ', 1)), ''
            ), CONCAT('http://', SUBSTRING_INDEX(SUBSTRING_INDEX(pi.description, 'http://', -1), ' ', 1)), ''), '\n', ''), '\r', ''),
            TRIM(pi.part_number), ''
        ))
    ))
    , 1, 30
) AS PDDSC1,
SUBSTRING(
    TRIM(CONCAT(
        CONCAT(
            TRIM(REPLACE(REPLACE(REPLACE(REPLACE(
                pi.part_number, CONCAT('https://', SUBSTRING_INDEX(SUBSTRING_INDEX(pi.part_number, 'https://', -1), ' ', 1)), ''
            ), CONCAT('http://', SUBSTRING_INDEX(SUBSTRING_INDEX(pi.part_number, 'http://', -1), ' ', 1)), ''), '\n', ''), '\r', ''))
            , ' '
        ),
        TRIM(REPLACE(
            REPLACE(REPLACE(REPLACE(REPLACE(
                pi.description, CONCAT('https://', SUBSTRING_INDEX(SUBSTRING_INDEX(pi.description, 'https://', -1), ' ', 1)), ''
            ), CONCAT('http://', SUBSTRING_INDEX(SUBSTRING_INDEX(pi.description, 'http://', -1), ' ', 1)), ''), '\n', ''), '\r', ''),
            TRIM(pi.part_number), ''
        ))
    ))
    , 31, 30
) AS PDDSC2
FROM purchase_request_items pi

Upvotes: 0

Jakub Matczak
Jakub Matczak

Reputation: 15656

What about

SELECT REPLACE(
  'alpha gamma http://gammadel.com/def delta beta', 
  CONCAT('http://', 
    SUBSTRING_INDEX(
      SUBSTRING_INDEX('alpha gamma http://gammadel.com/def delta beta', 'http://', -1),' ', 1)
  ),''
);

I've tested it for strings that you provided, but not sure if it meets fully your requirements.

Basically what this code does is:

  1. extract URL with SUBSTRING_INDEX() function
  2. replace URL with empty string in the original string.

Here's a full query to test each scenario:

SET @str1="foo bar http://foobar.com/abc";
SET @str2="foo http://foobar.com/def bar";
SET @str3="http://foobar.com/ghi foo bar";
SELECT
    REPLACE(
        @str1, 
        CONCAT('http://', 
            SUBSTRING_INDEX(
                SUBSTRING_INDEX(@str1, 'http://', -1),
                ' ', 1
            )
        ),''
    ) AS str1,
    REPLACE(
        @str2, 
        CONCAT('http://', 
            SUBSTRING_INDEX(
                SUBSTRING_INDEX(@str2, 'http://', -1),
                ' ', 1
            )
        ),''
    ) AS str2,
    REPLACE(
        @str3, 
        CONCAT('http://', 
            SUBSTRING_INDEX(
                SUBSTRING_INDEX(@str3, 'http://', -1),
                ' ', 1
            )
        ),''
    ) AS str3
;

Returns (as expected) :

foo bar
foo bar
foo bar

Upvotes: 3

ʰᵈˑ
ʰᵈˑ

Reputation: 11375

As you cannot use functions like preg_replace without having any addons/libraries - and as you've only tagged your question with mysql/sql, you're going to need to install this to give you the ability to use a regular expression replace. https://github.com/mysqludf/lib_mysqludf_preg#readme

Now that's installed, you can run;

SELECT CONVERT( 
   preg_replace('/(http:\/\/[^ \s]+)/i', '', foo)  
USING UTF8) AS result 
FROM `bar`;

This will give results like: https://regex101.com/r/qX6jB8/1

Upvotes: 2

Related Questions