Reputation: 157
I'm trying to do some analysis on tweets. I want to apply .lower()
to every text
in tweets. I used the following code
actual_tweets = actual_tweets.map(lambda line: line["text"].lower() and line["quoted_status"]["text"].lower() if 'quoted_status' in line else line["text"].lower()).collect()
The problem is this since i'm using map
, this line of code converts the text
attribute to lowercase and returns me the only the text
attribute ignoring all others which is not what i want. I just wanted to know if any of spark transformations
help me achieve what i want.
Upvotes: 0
Views: 3947
Reputation: 330063
You can for example return a tuple of (input, transformed_input):
def transform(line):
if 'quoted_status' in line:
return (
# Is `and` what you really want here?
line, line["text"].lower() and line["quoted_status"]["text"].lower()
)
else:
return line, line["text"].lower()
actual_tweets.map(transform)
Upvotes: 2