Code_Help
Code_Help

Reputation: 303

Apache Superset Connection to Mariadb via Spark

I would like to view data from Mariadb in Superset. I think getting the data from Mariadb --> Spark --> Superset might be the best solution because I will also use Spark with H2o Sparkling Water.

1. I have tried pip3 install mysqlclient but got this error:

ERROR: Command "/bin/python3 -u -c 'import setuptools, tokenize; ... failed with error code 1 in /tmp/pip-install-kslmastj/mysqlclient/

2. I tried Spark with 2 configuration files but I do not think the data is accessible in Spark or Spark SQL.

File 1 ../conf/spark-defaults.conf

spark.driver.extraClassPath = /usr/share/java/mysql-connector-java.jar
spark.executor.extraClassPath = /usr/share/java/mysql-connector-java.jar

File 2 ../conf/hive-site.xml

<configuration>
   <property>
      <name>javax.jdo.option.ConnectionURL</name>
      <value>jdbc:mysql://localhost:3306/DBNAME</value>
      <description>JDBC connect string for a JDBC metastore</description>
   </property>
   <property>
      <name>javax.jdo.option.ConnectionDriverName</name>
      <value>com.mysql.jdbc.Driver</value>
      <description>Driver class name for a JDBC metastore</description>
   </property>
     <property>
      <name>javax.jdo.option.ConnectionUserName</name>
      <value>USERNAME</value>
      <description>username to use against metastore database</description>
   </property>
   <property>
      <name>javax.jdo.option.ConnectionPassword</name>
      <value>PASSWORD</value>
      <description>password to use against metastore database</description>
   </property>
</configuration>

3. I read about the SQLAlchemy dialects and PyHive. I searched through the superset code and cannot determine where to add the external dialects.

4. I have tried a few configurations in the superset config file. I am wondering if the port should be the Spark port. SQLALCHEMY_DATABASE_URI = 'hive://localhost:4040/'

5. I attempted to import a csv file but got an error.

NOTE: I can see the Mariadb data in Spark if I enter this at the scala prompt but I don't think this is the proper solution.

import org.apache.spark.sql.SQLContext
val sqlcontext = new org.apache.spark.sql.SQLContext(sc)
val dataframe_mysql = sqlcontext.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/DATABASE_NAME").option("driver", "com.mysql.jdbc.Driver").option("dbtable", "TABLE_NAME").option("user", "USER_NAME").option("password", "PASSWORD").load()
dataframe_mysql.show() 

Upvotes: 3

Views: 2012

Answers (2)

Jonay C. P.
Jonay C. P.

Reputation: 31

I have connected a MariaDB with superset as my main BD instead the default SQLite, probably this will help you to connect.

SQLAlchemist needs an extra python lib for this task, in my case I have used pymysql. Once installed (you can use pip) the connection will be with the prefix mysql+pymysql, so it will have this look:

'mysql+pymysql://user:password@host/dbname'

Upvotes: 2

TylerH
TylerH

Reputation: 21066

Migrating OP's partial solution from question to an answer:

It is not the ideal solution but I was able to get pip3 install mysqlclient to work after reading the solution at this link.

sudo ln -s /usr/lib64/libmariadbclient.a /usr/lib64/libmariadb.a

Upvotes: 0

Related Questions