user151841
user151841

Reputation: 18066

Selecting multiple substrings from a field in MySQL

I have a field that is a longtext in MySQL. I'm looking for any instances of 'media' that could be in it, +/- ~10 characters of context. There are usually multiple instances in a single rows' field, so I need to see the context. How can I write a query to do this? I can't even think of where to start.

So what I'm looking at is this:

SELECT field_data_body FROM table WHERE field_data_body LIKE '%media%';
+----------------------------------+
| field_data_body                  |
+----------------------------------+
| ... ode__media_or ... e immediat | 
+----------------------------------+

The field is actually a long string, and I just parsed the actual test value to show the substrings that would match the WHERE clause.

What I actually want to see is all instances of the string media, which in the example above is two, but in other fields could be more. SUBSTR only shows the first instance of media.

Upvotes: 0

Views: 2194

Answers (3)

Salman Arshad
Salman Arshad

Reputation: 272416

CREATE FUNCTION of your own. Inside the function you can use the WHILE statement and general string functions such as LOCATE and SUBSTRING.

Here is an example to get you started:

DELIMITER $$

CREATE FUNCTION substring_list(
    haystack TEXT,
    needle VARCHAR(100)
)
RETURNS TEXT
DETERMINISTIC
BEGIN
    DECLARE needle_len INT DEFAULT CHAR_LENGTH(needle);
    DECLARE output_str TEXT DEFAULT '';
    DECLARE needle_pos INT DEFAULT LOCATE(needle, haystack);
    WHILE needle_pos > 0 DO
        SET output_str = CONCAT(output_str, SUBSTRING(haystack, GREATEST(needle_pos - 10, 1), LEAST(needle_pos - 1, 10) + needle_len + 10), '\n');
        SET needle_pos = LOCATE(needle, haystack, needle_pos + needle_len);
    END WHILE;
    RETURN output_str;
END$$

DELIMITER ;

Here are some tests. For each match, the term ("media") and up to 10 characters on either side are returned, all concatenated in a single string:

SELECT substring_list('1234567890media12345678immediate34567890media1234567890', 'media');
+---------------------------+
| 1234567890media12345678im |
| 12345678immediate34567890 |
| te34567890media1234567890 |
+---------------------------+
SELECT substring_list('0media12345678immediate34567890media1', 'media');
+---------------------------+
| 0media12345678im          |
| 12345678immediate34567890 |
| te34567890media1          |
+---------------------------+

Upvotes: 2

Mudassar Ali
Mudassar Ali

Reputation: 126

In mysql you can create a user define function for this like wordcount. You can get help from this UDF.

mysql count word in sql syntax

Upvotes: 1

Devon Bessemer
Devon Bessemer

Reputation: 35367

Here is a solution using PHP that will return each row and each result plus the surrounding characters in a multidimensional array.

$value = "media";
$surroundingChars = 5;
$strlen = strlen($value);

$stmt = $pdo->prepare("SELECT field_data_body FROM table WHERE field_data_body LIKE ?";
$stmt->execute([ '%'.$value.'%' ]);
$result = 0;
while ($body = $stmt->fetchColumn()) {
    $start = 0;
    while (($pos = stripos($body, $value, $start)) !== FALSE) {
         $return[$result][] = substr($body, $pos - $surroundingChars, $strlen + ($surroundingChars * 2));
         // Adjust next start 
         $start = $pos + $strlen;
    }
    $result++;
}

You could always change the $return[$result][] line, but to echo all rows in the format you wanted, you could do this:

foreach($return as $row) {
    echo implode('..', $row);
}

As you stated in the comments, you'd rather a query, but if you change your mind, here is a solution matching your PHP requirements.

Upvotes: 0

Related Questions