Reputation: 468
i have a table which looks like this : coumn 1 = timestamp : string , column 2 = numOfentites : int please note i am using hiveql
Fri, 10 Aug 2001 274
Fri, 10 Dec 1999 39
Fri, 10 Mar 2000 107
Fri, 10 May 2002 26
Fri, 10 Nov 2000 351
Fri, 10 Sep 1999 22
Fri, 11 Aug 2000 189
Fri, 11 Dec 1998 1
Fri, 11 Feb 2000 84
Fri, 11 Jan 2002 580
Fri, 11 Jun 1999 12
Fri, 11 May 2001 571
Fri, 12 Apr 2002 41
Now, I retrieved the frequency per year from this table and found out some year XXXX had the most number of entities.
My aim now is to go one level deep and extract the frequency per month for the year XXXX.
I tired using the group by clause on the substring indicating month but it doesn’t work.
can you guys please give me a direction on how to proceed..
Just need a hint not the answer :P trying to learn hiveql here
EDIT here is the query that i used to extract the frequency of entities on yearly basis. note that timestamp is the first column of the input.
select dates , count(dates) as numEmails
from (select split(timestamp," ")[3] as dates , count(timestamp)
from dataset
group by timestamp
) mailfreq
group by dates
order by numEmails desc;
Upvotes: 0
Views: 1046
Reputation: 1270301
I know that hivesql has strange limitations, but won't this work?
select split(timestamp," ")[3] as yr, split(timestamp," ")[2] as mon, count(timestamp)
from dataset
group by split(timestamp," ")[3], split(timestamp," ")[2];
Upvotes: 0