Reputation: 135
I'm currently writing a shell scrip that will query some hive tables for record counts per month for a list of tables and then extract the total count values as a .txt file. I currently have code that will query all the tables on a yearly basis, but how can I best have it looping on a monthly per year basis as well?
For example, right now my script will loop through each year I pass (year=2001, 2002,2003,...) and query my tables and extract the files. I would like to have it loop per month for each year so that there would ideally be 12 files per year and continue looping for whatever years I assign.
Example pseudocode below for what I currently have:
#!/usr/bin/sh
years=2001,2002,2003,2004
for year in $(echo ${years} | sed "s/,/ /g")
do
select_sql="INSERT OVERWRITE LOCAL DIRECTORY <path> ROW FORMAT DELIMITED FIELDS TERMINATED BY '~' select * from tbl where year(date)=$year"
beeline -u "<jdbc connection>" --hiveconf -e "$select_sql"
done
Upvotes: 0
Views: 870
Reputation: 4038
This question has a bash
tag, but the interpreter is /usr/bin/sh
.
Anyway, let's use bash
.
#!/bin/bash
for clause in "year(date)="{2001,2002,2003,2004}" and month(data)='"{Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec}"'"
do
select_sql="INSERT OVERWRITE LOCAL DIRECTORY <path> ROW FORMAT DELIMITED FIELDS TERMINATED BY '~' select * from tbl where $clause"
echo "$select_sql"
#beeline -u "<jdbc connection>" --hiveconf -e "$select_sql"
done
Upvotes: 1
Reputation: 3593
A nested loop and an array in this case could be something like:
years=2001,2002,2003,2004
months=(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec)
for year in $(echo ${years} | sed "s/,/ /g")
do
for month in "${months[@]}"
do
# do the query and file saving here
echo "$year $month"
done
done
Upvotes: 0