Leo Connelly
Leo Connelly

Reputation: 13

How do I loop through an entire JSON file and extract the data into variables

I'm working on a python file that extracts movies and their details from a JSON file and then saves the data to a custom movie object. Right now, I can select a single movie out of the huge list.

However, I want to be able to loop through and get every single genre, director, actor and add them to a separate array. Right now when I try to do this I get this error:

    Traceback (most recent call last):
  File "/Users/leoconnelly/PycharmProjects/MLFinal/tester.py", line 27, in <module>
    tempGenre = (contents['results'][i]['genre'])
TypeError: list indices must be integers or slices, not str

I also want to create an array of my custom movie objects that has title, cast, director, and genre.

Here's what my code looks like:

from movie import Movie
from user import User
import json
from pprint import pprint


movieArray = []
nameArray = []
directorArray =  []
genreArray = []
##actorArray = []

movieToBeInputted = Movie("","","","")


with open('movies.json') as f:
    contents = json.load(f)
    print(contents['results'][600]['title'])
    movieToBeInputted.name = (contents['results'][600]['title'])
    movieToBeInputted.director = (contents['results'][600]['director'])
    movieToBeInputted.genre = (contents['results'][600]['genre'])
    movieToBeInputted.actors = (contents['results'][600]['cast'])
    movieArray.append(movieToBeInputted)


for i in contents:
    tempGenre = (contents['results'][i]['genre'])
    genreArray.append(tempGenre) #this is where the error happens

    print("xxxxxxx")
    print(movieToBeInputted.actors)




##d = json.load(json_data)

##json_movie_data = json.dumps(json_data)




##movieToBeInputted.actors = json_movie_data

Here's my json data:

{
  "results": [
    {
      "title": "After Dark in Central Park",
      "year": 1900,
      "director": null,
      "cast": null,
      "genre": null,
      "notes": null
    },
    {
      "title": "Boarding School Girls' Pajama Parade",
      "year": 1900,
      "director": null,
      "cast": null,
      "genre": null,
      "notes": null
    },
    {
      "title": "Buffalo Bill's Wild West Parad",
      "year": 1900,
      "director": null,
      "cast": null,
      "genre": null,
      "notes": null
    },
    {
      "title": "Caught",
      "year": 1900,
      "director": null,
      "cast": null,
      "genre": null,
      "notes": null
    },
    {
      "title": "Clowns Spinning Hats",
      "year": 1900,
      "director": null,
      "cast": null,
      "genre": null,
      "notes": null
    },
    {
      "title": "Capture of Boer Battery by British",
      "year": 1900,
      "director": "James H. White",
      "cast": null,
      "genre": "Short documentary",
      "notes": null
    },
    {
      "title": "The Enchanted Drawing",
      "year": 1900,
      "director": "J. Stuart Blackton",
      "cast": null,
      "genre": null,
      "notes": null
    },
    {
      "title": "Family Troubles",
      "year": 1900,
      "director": null,
      "cast": null,
      "genre": null,
      "notes": null
    },
    {
      "title": "Feeding Sea Lions",
      "year": 1900,
      "director": null,
      "cast": "Paul Boyton",
      "genre": null,
      "notes": null
    },
    {
      "title": "How to Make a Fat Wife Out of Two Lean Ones",
      "year": 1900,
      "director": null,
      "cast": null,
      "genre": "Comedy",
      "notes": null
    },
    {
      "title": "New Life Rescue",
      "year": 1900,
      "director": null,
      "cast": null,
      "genre": null,
      "notes": null
    },
    {
      "title": "New Morning Bath",
      "year": 1900,
      "director": null,
      "cast": null,
      "genre": null,
      "notes": null
    }
  ]
}

Upvotes: 1

Views: 1331

Answers (1)

OneCricketeer
OneCricketeer

Reputation: 191874

You needed for i in range(len(content['results'])), then content['results'][i] would work as list indices must be integers

When you did for i in content, you are looping over keys of the content dictionary, which are strings.


However, contents['results'] is a list. You can loop over those as complete objects rather than getting a specific numeric index.

This uses list comprehension to get a complete list of movie objects from the results list.

with open('movies.json') as f:
    contents = json.load(f)
    results = contents.get('results', [])
    movies = [ 
       Movie(
           r.get('title'),
           r.get('director'),
           r.get('genre'),
           r.get('cast')
       ) for r in results ]
    for m in movies:
        print(m.name)

I want to be able to loop through and get every single genre, director, actor and add them to a separate array

You can do similarly from the movies array you have made.

This will return the unique directors for all movies by making a set object into a list.

directors = list(set(m.director for m in movies if m.director is not None))

Upvotes: 2

Related Questions