Reputation: 135
I have a string in a text column. I want to extract the hashtag values from the string into a new table so that I can find the distinct count for each hashtag.
Example strings->
NeverTrump is never more. They were crushed last night in Cleveland at Rules Committee by a vote of 87-12. MAKE AMERICA GREAT AGAIN!
CrookedHillary is outspending me by a combined 31 to 1 in Florida, Ohio, & Pennsylvania. I haven't started yet!
CrookedHillary is not qualified!
MakeAmericaSafeAgain!#GOPConvention #RNCinCLE
MakeAmericaGreatAgain #ImWithYou
Upvotes: 2
Views: 549
Reputation: 38290
This is what @lazilyInitialised said, I did a query with your data example:
with your_data as (--This is your data example, use your table instead of this CTE
select stack( 1,
1, --ID
" NeverTrump is never more. They were crushed last night in Cleveland at Rules Committee by a vote of 87-12. MAKE AMERICA GREAT AGAIN!
CrookedHillary is outspending me by a combined 31 to 1 in Florida, Ohio, & Pennsylvania. I haven't started yet!
CrookedHillary is not qualified!
MakeAmericaSafeAgain!#GOPConvention #RNCinCLE
MakeAmericaGreatAgain #ImWithYou
"
) as (id, str)
)
select id, word as hashtag
from
(
select id, word
from your_data d
lateral view outer explode(split(regexp_replace(d.str, '#',' #' ),'\\s')) l as word --replace hash w space+hash, split and explode words
)s
where word rlike '^#'
;
Result:
OK
id hashtag
1 #GOPConvention
1 #RNCinCLE
1 #ImWithYou
Time taken: 0.405 seconds, Fetched: 3 row(s)
Upvotes: 2
Reputation: 108
I am outlining the steps here as I'm not that good with the query, may update the answer once I get it right
explode()
lateral view functionality to get all the words of the string.WHERE
condition to fetch records starting with "#". LIKE '#%'
condition should work.Upvotes: 2