Apply function to spark RDD

Question

I'm trying to do some analysis on tweets. I want to apply .lower()to every text in tweets. I used the following code

    actual_tweets = actual_tweets.map(lambda line: line["text"].lower() and line["quoted_status"]["text"].lower() if 'quoted_status' in line else line["text"].lower()).collect()

The problem is this since i'm using map, this line of code converts the text attribute to lowercase and returns me the only the text attribute ignoring all others which is not what i want. I just wanted to know if any of spark transformations help me achieve what i want.

zero323 · Accepted Answer

You can for example return a tuple of (input, transformed_input):

def transform(line):
    if 'quoted_status' in line:
        return (
            # Is `and` what you really want here?
            line, line["text"].lower() and line["quoted_status"]["text"].lower() 
        )
    else:
        return line, line["text"].lower()

actual_tweets.map(transform)

Apply function to spark RDD

Answers (1)

Related Questions