Malte Susen
Malte Susen

Reputation: 845

JSON File: Separate Word Count for Different Objects with Python

For a current research project, I am planning to count the unique words of different objects in a JSON file. Ideally, the output file should show separate word count summaries (counting the occurence of unique words) for the texts in "Text Main", "Text Pro" and "Text Con". Is there any smart tweak to make this happen?

At the moment, I am receiving the following error message:

File "index.py", line 10, in <module>
text = data["Text_Main"]
TypeError: list indices must be integers or slices, not str

The JSON file has the following structure:

[
{"Stock Symbol":"A",
"Date":"05/11/2017",
"Text Main":"Text sample 1",
"Text Pro":"Text sample 2",
"Text Con":"Text sample 3"}
]

And the corresponding code looks like this:

# Import relevant libraries
import string
import json
import csv
import textblob

# Open JSON file and slice by object
file = open("Glassdoor_A.json", "r")
data = json.load(file)
text = data["Text_Main"]

# Create an empty dictionary
d = dict()

# Loop through each line of the file
for line in text:
    # Remove the leading spaces and newline character
    line = line.strip()

    # Convert the characters in line to
    # lowercase to avoid case mismatch
    line = line.lower()

    # Remove the punctuation marks from the line
    line = line.translate(line.maketrans("", "", string.punctuation))

    # Split the line into words
    words = line.split(" ")

    # Iterate over each word in line
    for word in words:
        # Check if the word is already in dictionary
        if word in d:
            # Increment count of word by 1
            d[word] = d[word] + 1
        else:
            # Add the word to dictionary with count 1
            d[word] = 1

# Print the contents of dictionary
for key in list(d.keys()):
    print(key, ":", d[key])

# Save results as CSV
with open('Glassdoor_A.csv', 'w', newline='') as file:
    writer = csv.writer(file)
    writer.writerow(["Word", "Occurences", "Percentage"])
    writer.writerows([key, d[key])

Upvotes: 0

Views: 1359

Answers (2)

tayken
tayken

Reputation: 127

Your JSON file has an object inside a list. In order to access the content you want, first you have to access the object via data[0]. Then you can access the string field. I would change the code to:

# Open JSON file and slice by object
file = open("Glassdoor_A.json", "r")
data = json.load(file)
json_obj = data[0]
text = json_obj["Text_Main"]

or you can access that field in a single line with text = data[0]["Text_Main"] as quamrana stated.

Upvotes: 1

quamrana
quamrana

Reputation: 39354

Well, firstly the key should be "Text Main" and secondly you need to access the first dict in the list. So just extract the text variable like this:

text = data[0]["Text Main"]

This should fix the error message.

Upvotes: 1

Related Questions