Reputation: 2011
In Scala, we would write an RDD to Redis like this:
datardd.foreachPartition(iter => {
val r = new RedisClient("hosturl", 6379)
iter.foreach(i => {
val (str, it) = i
val map = it.toMap
r.hmset(str, map)
})
})
I tried doing this in PySpark like this: datardd.foreachPartition(storeToRedis)
, where function storeToRedis
is defined as:
def storeToRedis(x):
r = redis.StrictRedis(host = 'hosturl', port = 6379)
for i in x:
r.set(i[0], dict(i[1]))
It gives me this:
ImportError: ('No module named redis', function subimport at 0x47879b0, ('redis',))
Of course, I have imported redis.
Upvotes: 5
Views: 7314
Reputation: 2011
PySpark's SparkContext has a addPyFile
method specifically for this thing.
Make the redis module a zip file (like this) and just call this method:
sc = SparkContext(appName = "analyze")
sc.addPyFile("/path/to/redis.zip")
Upvotes: 7