Reputation: 13
We have GA data in Big query, and some of my users want to join that to in house data in Hadoop which we can not move to Big Query.
Please let me know what is the best way to do this.
Upvotes: 0
Views: 1791
Reputation: 7795
You could follow the route of the Hadoop connecter as Felipe Hoffa suggested.. Or build your own application which will transfer data from BigQuery to your Hadoop cluster. In both ways, you will be able to make the required joins on the hadoop cluster using Pig, Hive etc.
In case you want to try the application method, let me take you through a process flow which your application may need to follow:
Let me know if you need anymore details or clarifications. I went down this route because I found the connector alternative a little too complex. But that is a subjective opinion varying from a person to person.
Upvotes: 1
Reputation: 59325
See BigQuery to Hadoop Cluster - How to transfer data?:
The easiest way to go from BigQuery to Hadoop is to use the official Google BigQuery Connector for Hadoop
https://cloud.google.com/hadoop/bigquery-connector
This connector defines a BigQueryInputFormat class.
(It uses Google Cloud Storage as an intermediary between BigQuery's data and the splits that Hadoop consumes)
Upvotes: 1