Reputation: 303
I'm working on data processing using spark and cassandra.
What I want to do is read and load the data from cassandra first. Process the data and write them back to cassandra.
When spark does the map function, an error occurs - Row is read-only <class 'Exception'>
Here is my method. Showing as the below
def detect_image(image_attribute):
image_id = image_attribute['image_id']
image_url = image_attribute['image_url']
if image_attribute['status'] is None:
image_attribute['status'] = Status()
image_attribute['status']['detect_count'] += 1
... # the other item assignment
cassandra_data = sql_context.read.format("org.apache.spark.sql.cassandra").options(table="photo",
keyspace="data").load()
cassandra_data_processed = cassandra_data.rdd.map(process_batch_image)
cassandra_data_processed.toDF().write \
.format("org.apache.spark.sql.cassandra") \
.mode('overwrite') \
.options(table="photo", keyspace="data") \
.save()
The error of Row is read-only <class 'Exception'>
are in line
image_attribute['status'] = Status()
and
image_attribute['status']['detect_count'] += 1
is it necessary to copy the image_attribute
to be a new object? However, the image_attribute is a nested objects. It will be so hard to copy one by one layer.
Upvotes: 1
Views: 505
Reputation: 2921
Your suggestion is absolutely right. The map function converts an incoming type to another type. That is at least thr intention. The incoming object is immutable to make this operation idempotent. I guess there is no way around copying the image objects (manually or using something like deepcopy
)
Hope that helps
Upvotes: 1