alwaysaskingquestions
alwaysaskingquestions

Reputation: 1657

Trying to turn a dictionary into dataframe and append to the same dataframe with a for loop

Ok, to be completely honest, I am not exactly sure how to ask this question, since I think the error could happen in multiple places, so I'll just type all of them out (thanks for being patient with a noob here).

I am trying to use the lastfm database: https://grouplens.org/datasets/hetrec-2011/

so they have this python script that helps us to read the data from this dataset.

so what i did, is to first parse the line of a csv file with the given iter_lines function:

### first, open file into a file handle object
file = os.path.join(baseDir, 'artists.dat')
file_opener = open(file, "r")
lines = iter_lines(file_opener)

where the iter_lines() function look like this (given):

def iter_lines(open_file):
    reader = csv.reader(
        open_file,
        delimiter='\t',
    )
    next(reader)  # Skip the header
    return reader

then I tried to use their given parse_artist_line() function to read the artist.csv:

artists_df = pd.DataFrame(['key','value'])

for line in lines:
    ### so the parse_artist_line() will return a dictionary
    artist_dict = parse_artist_line(line)
    artist_list = artist_dict.items()

    ### try to put in a temporary dataframe
    temp = pd.DataFrame.from_dict(artist_dict, orient='index')

    ### finally append the temporary df to the artists_df
    artists_df.append(temp, ignore_index=True)

print(artists_df.head(5))

and when i print the artists_df with the last statement, i only get this output:

       0
0    key
1  value

and their parse_artist_line() look like this:

def parse_artist_line(line):
    (artist_id, name, _, _) = line
    current_artist = deepcopy(ARTISTS)
    current_artist["artist_id"] = int(artist_id)
    current_artist["name"] = name

    return current_artist

btw, if you print temp, it looks like this:

                     0
artist_id        18743
name       Coptic Rain

and if i try to use "columns" for the "orient" argument input for from_dict() i'd get an error:

ValueError: If using all scalar values, you must pass an index

I've followed the following posts/info pages:

I'm not sure anymore, what i'm doing wrong (probably every step). Any help/guidance is appreciated!

Upvotes: 1

Views: 87

Answers (1)

jezrael
jezrael

Reputation: 862671

I believe here is not necessary convert file to dict and then to DataFrame, simplier is use read_csv and if necessary filter columns names add parameter usecols:

artists_df = pd.read_csv('artists.dat', sep='\t', usecols=['id','name'])
print (artists_df.head())

   id               name
0   1       MALICE MIZER
1   2    Diary of Dreams
2   3  Carpathian Forest
3   4       Moi dix Mois
4   5        Bella Morte

If want read all columns:

artists_df = pd.read_csv('artists.dat', sep='\t')
print (artists_df.head())

   id               name                                         url  \
0   1       MALICE MIZER       http://www.last.fm/music/MALICE+MIZER   
1   2    Diary of Dreams    http://www.last.fm/music/Diary+of+Dreams   
2   3  Carpathian Forest  http://www.last.fm/music/Carpathian+Forest   
3   4       Moi dix Mois       http://www.last.fm/music/Moi+dix+Mois   
4   5        Bella Morte        http://www.last.fm/music/Bella+Morte   

                                          pictureURL  
0    http://userserve-ak.last.fm/serve/252/10808.jpg  
1  http://userserve-ak.last.fm/serve/252/3052066.jpg  
2  http://userserve-ak.last.fm/serve/252/40222717...  
3  http://userserve-ak.last.fm/serve/252/54697835...  
4  http://userserve-ak.last.fm/serve/252/14789013...  

Upvotes: 1

Related Questions