Reputation: 27
I have a csv file that contains the name (of a video game), platform, Genre, Publisher, etc. I am trying to create 3 separate dictionaries. Dictionary one was easy since the key used was the title of a video game which is unique.
For the 2nd and 3rd dictionary, I am having issues since the keys "Genre"
and "Publisher"
are not unique. I am trying to have D2
look like:
D2 = { 'Puzzle' : [(tup2),(tup2], 'Another genre': [(tup2)]...}
Since there are multiple games that have the same genre.
import csv
fp = open("video_game_sales_tiny.csv", 'r')
fp.readline()
reader = csv.reader(fp)
D1 = {}
D2 = {}
D3 = {}
for line in reader:
name = line[0].lower().strip()
platform = line[1].lower().strip()
if line[2] in (None, 'N/A'):
pass
else:
year = int(line[2])
genre = line[3].lower().strip()
publisher = line[4]
na_sales = float(line[5])
europe_sales = float(line[6])*1000000
japan_sales = float(line[7])*1000000
other_sales = float(line[8])*1000000
global_sales = (europe_sales + japan_sales + other_sales)
tup = (name,platform, year,genre, publisher, global_sales)
tup2 = (genre, year, na_sales, europe_sales, japan_sales, other_sales, global_sales)
tup3 = (publisher, name, year, na_sales, europe_sales, japan_sales, other_sales, global_sales)
D1[name] = tup
D2[genre] = tup2
D3[publisher] = tup3
print(D1)
print(D2)
print(D3)
Upvotes: 0
Views: 83
Reputation:
You have a problem with non-unique keys.
If that problem is corrected (you need unique keys), the merge(
) method can be used with any other how options (left, right, inner, ...).
The Pandas Library merge()
method is very powerful and will solve your problem.
But, you need to do something about non-unique keys problem.
I suggest use the method unique()
and make your own list of indexes for each DataFrame
. This will be only one more layer in your ETL process.
Suppose you have two DataFrames
: df_a
and df_b
. This dataframes share a unique key called u_key
.
The merge process with these dataframes will be something like:
import pandas as pd
...
left_merge = pd.merge(df_a, df_b, on=["u_key"], how="left")
Upvotes: 0
Reputation: 51008
You should create the entry for genre
(for instance) as a list, and then append to the list.
if not genre in D2:
D2[genre] = []
D2[genre].append(tup2)
Upvotes: 1