Write data to Redis from PySpark

Question

In Scala, we would write an RDD to Redis like this:

datardd.foreachPartition(iter => {
      val r = new RedisClient("hosturl", 6379)
      iter.foreach(i => {
        val (str, it) = i
        val map = it.toMap
        r.hmset(str, map)
      })
    })

I tried doing this in PySpark like this: datardd.foreachPartition(storeToRedis), where function storeToRedis is defined as:

def storeToRedis(x):
    r = redis.StrictRedis(host = 'hosturl', port = 6379)
    for i in x:
        r.set(i[0], dict(i[1]))

It gives me this:

ImportError: ('No module named redis', function subimport at 0x47879b0, ('redis',))

Of course, I have imported redis.

kamalbanga · Accepted Answer

PySpark's SparkContext has a addPyFile method specifically for this thing. Make the redis module a zip file (like this) and just call this method:

sc = SparkContext(appName = "analyze")
sc.addPyFile("/path/to/redis.zip")

Write data to Redis from PySpark

Answers (1)

Related Questions