Stanleyrr
Stanleyrr

Reputation: 875

Unable to load data from a CSV file into HIVE

I am getting 'None' values while loading data from a CSV file into hive external table.
My CSV file structure is like this:

creation_month,accts_created
7/1/2018,40847
6/1/2018,67216
5/1/2018,76009
4/1/2018,87611
3/1/2018,99687
2/1/2018,92631
1/1/2018,111951
12/1/2017,107717

'creation_month' and 'accts_created' are my column headers.

create external table monthly_creation
(creation_month DATE,
 accts_created INT
 )
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' location '/user/dir4/'

The location is '/user/dir4/' because that's where I put the 'monthly_acct_creation.csv' file, as seen in the screenshot below:

enter image description here

I have no idea why the external table I created had all 'None' values when the source data have dates and numbers.
Can anybody help?

Upvotes: 1

Views: 2427

Answers (2)

notNull
notNull

Reputation: 31490

Date data type format in hive only accepts yyyy-MM-dd as your date field is not in the same format and that results null values in creation_month field value.

Create table with creation_month field as string datatype and skip the first line by using skip.header.line property in create table statement.

Try with below ddl:

hive> create external table monthly_creation
(creation_month string,
 accts_created INT
 )
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
Location '/user/dir4/'
tblproperties ("skip.header.line.count"="1");

hive> select * from monthly_creation;
+-----------------+----------------+--+
| creation_month  | accts_created  |
+-----------------+----------------+--+
| 7/1/2018        | 40847          |
| 6/1/2018        | 67216          |
| 5/1/2018        | 76009          |
| 4/1/2018        | 87611          |
| 3/1/2018        | 99687          |
| 2/1/2018        | 92631          |
| 1/1/2018        | 111951         |
| 12/1/2017       | 107717         |
+-----------------+----------------+--+

Upvotes: 1

OneCricketeer
OneCricketeer

Reputation: 191743

DATE values describe a particular year/month/day, in the form YYYY-­MM-­DD. For example, DATE '2013-­01-­01'.

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-date

I suggest using string type for your date column, which you can convert later or parse into timestamps.

Regarding the integer column, you'll need to skip the header for all columns to be appropriately converted to int types


By the way, new versions of HUE allow you to build Hive tables directly from CSV

Upvotes: 1

Related Questions