user3476463
user3476463

Reputation: 4575

Pandas DataFrame to Hive Table

I'm new to Python and Hive.

I was hoping I might get some advice.

Does anyone have any tips on how to turn a python pandas dataframe into a hive table?

Upvotes: 3

Views: 7635

Answers (2)

Rummble
Rummble

Reputation: 1

Based on Jose Antonio Martin H's answer... I could not find an easy way of doing this. I've been unable to get Pandas Dataframe.to_sql() working with the Cloudera ODBC driver So, as mine is a one-off case, I've manually exported Dataframe.to_csv() and used the HUE/Hive Importer tool on it once it's on HDFS Where Jose's answer helped me is in using a non-comma delimiter ("|" actually, rather than "," or "\t") and also, turning index off. These seemed to help the process. I could not get parquet format to work, with or without compression - which I had thought to be the problem. And neither could "load data local inpath"

Just my experience, if it helps. If I get any of it working programmatically I'll try to let you here know.

(BTW I can't comment yet, but hopefully sharing my own experience here helps others in a predicament.)

Upvotes: 0

Jose Antonio Martin H
Jose Antonio Martin H

Reputation: 1511

Your script should run inside a machine where hive can load data using the "load local data in path" method.

  1. Query pandas data frame to create a list of column name datatype

  2. Compose a valid HQL (DDL) create table statement using python string operations (basically concatenations)

  3. Issue a create table statement in Hive.

  4. Write the pandas dataframe as cvs separated by "\t" turning headers off and index off (check paramerets of to_csv() )

5.- From your python script call a system console running hive -e:

Use: for instance:


p = subprocess.Popen( ['hive', '-e', str_command_list], stdout = subprocess.PIPE,
                                                        stderr = subprocess.PIPE )
out, err = p.communicate()

This will call hive console and execute for instance, load data local inpath, inserting your csv data into the created table.

Then you are happy.

Upvotes: 1

Related Questions