Reputation: 393
I am trying to write a query to split a column after every @ character. I then want to be able to do a count of these within each segment.
I have managed to write the following query in Hive:
SELECT
distinct split (msg_txt,'\\@')[0] AS first_msg, count(*)
FROM table1
;
but this won't allow me to add a group by in order to get a count. I tried doing this with a subquery:
SELECT first_msg, count(*)
FROM (
SELECT
distinct split (msg_txt,'\\@')[0] AS first_msg
FROM table1
)
GROUP BY first_msg
;
but this gives me the following error:
Error while compiling statement: FAILED: ParseException line 7:6 missing EOF at 'BY' near 'GROUP'
so not sure how I can write this query.
If somebody could please advise would really appreciate it.
Thanks in advance.
Upvotes: 0
Views: 2110
Reputation: 7947
base on your requirement, I am not sure why you are getting the first element. The query to ignore the first element of the split (considering you want to apply the group for all the elements after "@) should look like this
select value, count(*) from (
select
pos,value
from table1 lateral view posexplode(split (msg_txt,'\\@')) explodedcol as pos,value limit 10
) t where pos != 0 group by value
;
if you want to include all the elements splitting by "@", just remove the "post != 0" condition from the where clause.
Regards,
Upvotes: 1
Reputation: 1269623
I think you just need a table alias:
SELECT first_msg, count(*)
FROM (SELECT distinct split(msg_txt,'\\@')[0] AS first_msg
FROM table1
) t
GROUP BY first_msg;
Hive requires a table aliases:
The subquery has to be given a name because every table in a FROM clause must have a name.
In your version, it treats GROUP
as the name of the subquery. The BY
then doesn't make sense.
As written, this is a bit non-sensical because you can just do:
SELECT distinct split(msg_txt,'\\@')[0] AS first_msg, 1 as cnt
FROM table1;
The distinct
in the subquery will ensure that all values are unique. I assume your actual problem is a little more complicated.
Upvotes: 1