Loading data into dictionary

Question

I enrolled in datascience coursework by Coursera and doing one of the assignments lead me to writing this code.

import sys
import json
import re

def lines(fp):
    print str(len(fp.readlines()))

def main():
        tweet_file = open(sys.argv[1])
        word_frequency_count = {}

        for line in tweet_file:
                raw_data = json.loads(line)
                #print raw_data
                text = raw_data.get('text', "").lower().encode('utf-8')
                new_text=re.findall(r"[\w']+", text)
                print new_text
                #print text
        for word in new_text:
                word_frequency_count[word] = 'Test'

        print word_frequency_count.items()


if __name__ == '__main__':
    main()

The print statement for new_text outputs lines like these and there are thousands of results like this just giving this a sample.

['rt', 'fuadagus2', 'presiden', 'sby', 'belilah', 'nuklir', 'kpd', 'korut', 'luncurkan', 'ke', 'israel', 'tunjukan', 'kalau', 'kamu', 'islam', 'prayforgaza']
['not', 'letting', 'nothing', 'else', 'get', 'in', 'my', 'way']

The thing that bothers me is the last print of the dictionary key value pairs which only outputs 5 key value pairs. I am a Java developer this is my first foray into python, Am i missing anything too obvious here?

jonrsharpe · Accepted Answer

Your indentation is wrong:

for line in tweet_file:
    ...
for word in new_text:
    word_frequency_count[word] = 'Test'

The second loop happens outside the first loop, so only processes the new_text list from the last line in tweet_file. It should instead be:

for line in tweet_file:
    ...
    for word in new_text:
        word_frequency_count[word] = 'Test'

However, note that Python comes with "batteries included"; in this case, collections.Counter will make your life much easier.

Loading data into dictionary

Answers (2)

Related Questions