scrpaingnoob
scrpaingnoob

Reputation: 157

Apply function to spark RDD

I'm trying to do some analysis on tweets. I want to apply .lower()to every text in tweets. I used the following code

    actual_tweets = actual_tweets.map(lambda line: line["text"].lower() and line["quoted_status"]["text"].lower() if 'quoted_status' in line else line["text"].lower()).collect()

The problem is this since i'm using map, this line of code converts the text attribute to lowercase and returns me the only the text attribute ignoring all others which is not what i want. I just wanted to know if any of spark transformations help me achieve what i want.

Upvotes: 0

Views: 3947

Answers (1)

zero323
zero323

Reputation: 330063

You can for example return a tuple of (input, transformed_input):

def transform(line):
    if 'quoted_status' in line:
        return (
            # Is `and` what you really want here?
            line, line["text"].lower() and line["quoted_status"]["text"].lower() 
        )
    else:
        return line, line["text"].lower()

actual_tweets.map(transform)

Upvotes: 2

Related Questions