Reputation: 4109
Consider the following table of tweets
id tweet
------------------------------------------------------
1 alcoa inc stock analysis
2 bullrider has added alcoa inc to portfolio
3 caterpillar annual results
4 more at http://bit.ly/d3423 on caterpillar
5 making apple inc profits
I would like to replace company names with a specified label, as follows:
id tweet
------------------------------------------------------
1 {COMPANY|AA} stock analysis
2 bullrider has added {COMPANY|AA} to portfolio
3 {COMPANY|CAT} annual results
4 more at http://bit.ly/d3423 on {COMPANY|CAT}
5 making {COMPANY|AAPL} profits
I now have the following query:
UPDATE
tweets
SET
tweet = REPLACE(tweet, 'alcoa inc', '{COMPANY|AA}')
WHERE
tweet LIKE '% alcoa inc %'
OR
tweet LIKE 'alcoa inc %'
OR
tweet LIKE '% alcoa inc'
I have two questions, though:
array("alcoa inc" => "{COMPANY|AA}", "caterpillar" => "{COMPANY|CAT}", "apple inc" => "{COMPANY{AAPL}")
and loop through it in SQL for a mass-replace.Your help is appreciated :-)
Upvotes: 4
Views: 634
Reputation: 56779
You could try storing the list of tickers and companies in a table and use a query like this:
select
case
when c.ticker is not null then
replace(t.tweet, c.name,
concat('{COMPANY|',c.ticker,'}'))
else t.tweet
end as tweet
from
tweets t
left join
company c
on
t.tweet like concat('% ', c.name, ' %')
or
t.tweet like concat(c.name, ' %')
or
t.tweet like concat('% ', c.name)
group by
t.id
The only downside being this solution won't handle the case where two companies' names appear in the same tweet. Only one of them will get replaced.
Demo: http://www.sqlfiddle.com/#!2/8da9d/2
Edit: As pointed out by @Mario
, it's possible that the matching will cause an incorrect replacement if there are multiple instances in the same string. For instance, using the string I have a redapple from apple
with the replacement apple
will result in the string I have a red{COMPANY|AAPL} from {COMPANY|AAPL}
because the second apple
triggers a LIKE
match, then the REPLACE
replaces every match. I have an updated query that handles this situation as well by reading the matching strings out of a table as well:
select
case
when p.ticker is not null then
replace(t.tweet,
replace(p.replacestr, '{0}', p.name),
replace(p.replacestr, '{0}', concat('{COMPANY|',p.ticker,'}')))
else t.tweet
end as tweet
from
tweets t
left join
(select * from pattern,company) p
on t.tweet like replace(p.pattern, '{0}', p.name);
group by
t.id
Where Pattern
is a table defined like this:
CREATE TABLE Pattern
(pattern varchar(50), replacestr varchar(50));
INSERT INTO Pattern
(pattern, replacestr)
VALUES
('% {0} %', ' {0} '),
('{0} %', '{0} '),
('% {0}', ' {0}');
Demo: http://www.sqlfiddle.com/#!2/c71d4/3
Upvotes: 3
Reputation: 11955
One way of doing it is with a stored procedure:
CREATE PROCEDURE UpdateTweetsWithCompany
@CompanyName varchar(255) = 'alcoa inc',
@ReplaceValue varchar(255) = '{COMPANY|AA}'
AS
BEGIN
UPDATE
tweets
SET
tweet = REPLACE(tweet, @CompanyName, @ReplaceValue)
WHERE
tweet LIKE '% ' + @CompanyName + ' %'
OR
tweet LIKE @CompanyName + ' %'
OR
tweet LIKE '% ' + @CompanyName
END
And then call it, somewhat like this:
EXEC UpdateTweetsWithCompany 'alcoa inc', '{COMPANY|AA}'
Upvotes: 0
Reputation: 72676
Isn't there a better way to catch all possible "alcoa inc" instances?
REGEX(Reqular Expression) search. Take a look a the official mysql REGEX documentation for further information.
Is it possible in SQL to write a sort of an array for multiple replacements (in SQL, so not from within PHP). Here. I mean defining in SQL something like array("alcoa inc" => "{COMPANY|AA}", "caterpillar" => "{COMPANY|CAT}", "apple inc" => "{COMPANY{AAPL}") and loop through it in SQL for a mass-replace.
Yes, you can create a specific table to store the key->value dictionary and then get value from it, so fo example :
table : dictionary
+----------+-----------------------+---------------+
| name | pattern | replacement |
+----------+-----------------------+---------------+
|alcoa inc | [[:space:]]+alcoa inc |{COMPANY|AA} |
|apple inc | apple inc.* |{COMPANY{AAPL} |
And you can update with a query like the following, for example :
UPDATE tweets SET tweets = REPLACE(tweet, 'alcoa inc', '{COMPANY|AA}')
WHERE tweet RLIKE (SELECT pattern FROM dictionary WHERE name = 'alcoa inc');
It is only an example to illustrate a possible way, you have to tune it for your requirements and specific context.
Upvotes: 1