Combine words into lines

Question

I have text file and need to filter lines that have more than 6 words. I create the RDD:

my_data = sc.textFile("lines.txt")

Then i split each line in words:

line_words = my_data.map(lambda x: x.split(' '))

And apply filter and save results to a file:

filtered_lines = line_words.filter(lambda x: len(x) > 6)
filtered_lines.saveAsTextFile("out")

And from initial file:

hello world
its fun to have fun but you have to know how

I get:

[u'its', u'fun', u'to', u'have', u'fun', u'but', u'you', u'have', u'to', u'know', u'how']

How do i combine words back into a line without brackets and u'' ?

I know better do something like that:

 my_data.filter(lambda x: len(x.split(' ')) > 6).saveAsTextFile("out")

But i want to learn how to make results readable for human.

Magnus Buvarp · Accepted Answer

You can use the string.join(array) function to convert elements of an array to a string , where string is the delimiter:

line = [u'its', u'fun', u'to', u'have', u'fun', u'but', u'you', u'have', u'to', u'know', u'how']
sentence = " ".join(line)

Is this what you mean?

Answers (2)