Reputation: 11669
I have a csv file like this:
[email protected], 01-05-2014
[email protected], 01-05-2014
[email protected], 01-05-2014
[email protected], 01-05-2014
I need to read the above csv file and extract domain name and also the count of emails address by domain name and date as well. All these things I need to insert into MySQL database but somehow I am stuck how to insert into MySQL database after iterating the list I got.
Query will be like this:
INSERT INTO domains(domain_name, cnt, date_of_entry) VALUES (%s, %s, %s);
Below is the code
#!/usr/bin/python
import fileinput
import csv
import os
import sys
import MySQLdb
from collections import defaultdict
lst = defaultdict(list)
d_lst = defaultdict(list)
# ======================== Defined Functions ======================
def get_file_path(filename):
currentdirpath = os.getcwd()
# get current working directory path
filepath = os.path.join(currentdirpath, filename)
return filepath
# ===========================================================
def read_CSV(filepath):
domain_list = []
domain_date_list = []
sorted_domain_list_bydate = defaultdict(list)
with open(filepath, 'rb') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
# insert the 1st & 2nd column of the CSV file into a set called input_list
email = row[0].strip().lower()
date = row[1].strip()
domain_date_list.append([date, email[ email.find("@") : ]])
domain_list.append(email[ email.find("@") : ])
for k, v in domain_date_list:
sorted_domain_list_bydate[k].append(v)
# remove duplicates from domain list
domain_list = list(set(domain_list))
return sorted_domain_list_bydate, domain_list
# ===========================================================
def update_DB(lst):
# open a database connection
db = MySQLdb.connect(host="localhost", # your host, usually localhost
user="root", # your username
passwd="abcdef1234", # your password
db="test") # name of the data base
cur = db.cursor()
a = []
for k, v in lst.items():
# now what should I do here?
# this is what I am confuse
db.commit()
db.close()
# ==========================================================
# ======================= main program =======================================
path = get_file_path('emails.csv')
[lst, d_lst] = read_CSV(path) # read the input file
update_DB(lst) # insert data into domains table
I am confuse in update_DB
method.
Upvotes: 0
Views: 5290
Reputation: 174624
I am not sure why you have such a complicated program for a simple task. Lets start from the top:
You need to first organize your data correctly by domain, date and then count.
import csv
from collections import defuaultdict, Counter
domain_counts = defaultdict(Counter)
with open('somefile.csv') as f:
reader = csv.reader(f)
for row in reader:
domain_counts[row[0].split('@')[1].strip()][row[1]] += 1
Next, you need to insert each row correctly in the database:
db = MySQLdb.connect(...)
cur = db.cursor()
q = 'INSERT INTO domains(domain_name, cnt, date_of_entry) VALUES(%s, %s, %s)'
for domain, data in domain_counts.iteritems():
for email_date, email_count in data.iteritems():
cur.execute(q, (domain, email_count, email_date))
db.commit()
As your dates are not being inserted correctly, try this updated query instead:
q = """INSERT INTO
domains(domain_name, cnt, date_of_entry)
VALUES(%s, %s, STR_TO_DATE(%s, '%d-%m-%Y'))"""
Upvotes: 2
Reputation: 5113
the read_csv function here is returning the sorteddomainlistbydate,and the domain_list (which is a list), used by the update_db function, that's where you do the insert.
your list just contains the domain names, while what it should for per pair of key vals should contain the domain name and the count like
google.com,2
live.com,1
for k, v in lst.items():
cur.execute("INSERT INTO domains(domain_name, cnt, date_of_entry) VALUES ('" + str(k) + "','" + str(v) + "','" + str(time.strftime("%d/%m/%Y"))+"')")
Upvotes: 0