radhika sharma
radhika sharma

Reputation: 570

Pyspark : TypeError: unsupported operand type(s) for +: 'int' and 'str'

I am learning Pyspark and just a beginner. I am getting the error as mentioned in the title. I have followed similar questions and tried what is mentioned here but still doesn't help. https://stackoverflow.com/questions/20441035/unsupported-operand-types-for-int-and-str

Please find below some of my code snippet

age=lines.map(lambda x: x.split(',')[2])
friends=lines.map(lambda x: x.split(',')[3])

rdd=lines.map(lambda x: int(x.split(',')[2]) +","+ int(x.split(',')[3]))


totalsByAge = rdd.mapValues(lambda x: (x, 1)).reduceByKey(lambda x, y: (x[0] + y[0], x[1] + y[1]))
averagesByAge = totalsByAge.mapValues(lambda x: x[0] / x[1])
results = averagesByAge.collect()
for result in results:
    print(result)

I have converted rdd to int while using map but still getting the error as

    rdd=lines.map(lambda x: int(x.split(',')[2]) +","+ int(x.split(',')[3]))
TypeError: unsupported operand type(s) for +: 'int' and 'str'

I also tried removing "+" but not getting the right syntax.

Upvotes: 2

Views: 4298

Answers (1)

Robert Kossendey
Robert Kossendey

Reputation: 7028

You are adding integers and strings, which can not be done in python. You would first have to concat the strings, and then cast them to int.

rdd=lines.map(lambda x: int(x.split(',')[2] +","+ x.split(',')[3]))

Upvotes: 2

Related Questions