6olden
6olden

Reputation: 121

Extract/scrape specific data from JSON file

This been bugging me for quite a few hours. I've been searching a lot and I have found a lot of information. The problem is, I'm not that good, I'm actually a beginner to the max. I'd like to achieve this with Python (if it's possible!). Maybe with JavaScript and PHP also? Let me explain.

I just found this website http://listeningroom.net and it's great. You can create/join rooms and upload tracks and listen to them together with friends.

I'd like to extract/scrape/get some specific data from a .json file. This file contains artist, album title, track title and more. I'd like to extract just the artist, album and track title.

http://listeningroom.net/room/chillasfuck/spins.json The .json file Contains the tracks played in the past 24 hours.

I managed to scrape the whole .json file with Python after looking around, (local .json file) with the following probably not so valid code.

   json_data=open('...\spins.json')

   data = json.load(json_data)
   pprint(data)

   json_data.close()

This prints out the following:

   [{u'endTime': u'1317752614105',
   u'id': u'cf37894e8eaf886a0d000000',
   u'length': 492330,
   u'metadata': {u'album': u'Mezzanine',
            u'artist': u'Massive Attack',
            u'bitrate': 128000,
            u'label': u'Virgin',
            u'length': 17494.479054779807,
            u'title': u'Group Four'},

Just a part of the print

1. I'd like to scrape it from an url (the one provided at the top) 2. Just get 'album', 'artist' and 'title' 3. Make sure it prints it as simple as possible like this:

Artist
Track title
Album

Artist
Track title
Album

4. If it's not too much, save it to a .txt file

I hope I could get some help, I really want to create this for myself, so I can check out more music!

Marvin

Upvotes: 1

Views: 4823

Answers (4)

Griffin
Griffin

Reputation: 851

Okay this is a bit short but the thing about json is that it translate an array into a string

eg. array['first'] = 'hello'; array['second'] = 'there';

will become [{u'first': u'hello', u'second': 'there'}]; after a jsonencode run that sting throu jsondecode and you get your array back

so simply run you json file thou a decoder and then you should be able to reach your data through:

array['metadata'].album
array['metadata'].artist
...

have never used python but it should be the same.

have a look at http://www.php.net/manual/en/function.json-decode.php it might clear upp a thing or two.

Upvotes: 2

Jeremy
Jeremy

Reputation: 2000

You're already really close.

data = json.load(json_data)

is taking the JSON string and converting it to a Python object - in this case, a list of dictionaries (plus 'metadata', which is a dictionary of dictionaries).

To get this into the format that you want, you just need to loop through the items.

for song in data:
    artist = song['metadata']['artist'] # This tells it where to look in the dictionary. It's looking for the dictionary item called 'metadata'. Then, looking inside that dictionary for 'artist'.
    album = song['metadata'['album']
    songTitle = song['metadata']['title']
    print '%s\n%s\n%s\n' % (artist, album, songTitle)

Or, to print it to a file:

with open('the_file_name.txt','w') as f:
    for song in data:
        artist = song['metadata']['artist']
        album = song['metadata'['album']
        songTitle = song['metadata']['title']
        f.write('%s\n%s\n%s\n' % (artist, album, songTitle))

Upvotes: 2

JBernardo
JBernardo

Reputation: 33407

Python (after you loaded the json)

for elem in data:
    print('{artist}\n{title}\n{album}\n'.format(**elem['metadata']))

To save in a file:

with open('the_file_name.txt','w') as f:
    for elem in data:
        f.write('{artist}\n{title}\n{album}\n\n'.format(**elem['metadata']))

Upvotes: 3

Mob
Mob

Reputation: 11106

For PHP you need json.decode

<?php
      $json = file_get_contents($url);
      $val = json_decode($json);
      $room = $val[0]->metadata;
echo "Album : ".$room->album."\n";
echo "Artist : ".$room->artist."\n";
echo "Title : ".$room->title."\n";
?>

Outputs

Album  :  Future Sandwich
Artist :  Them, Roaringtwenties
Title  :  Fast Acting Nite-Nite Spray With Realistic Uncle Beard

Note its a truck load of JSON data there so you'll have to iterate adequately

Upvotes: 1

Related Questions