acpigeon
acpigeon

Reputation: 1729

Problems handling multiple data types in CSV input with Python

I have a csv dump that I'm trying to import to run analysis on metrics therein, cherry picking certain metrics to look at. Some of the cells are strings and some are numbers. However, I can't get csv.reader to handle the numbers properly. A snippet:

with open('t0.csv', 'rU') as file:
    reader = csv.reader(file, delimiter=",", quotechar='|')
    reader.next() # Burn header row
    for row in reader:
        if row[0] != "": # Burn footer row
            t0_company.extend([unicode(row[3], 'utf-8')])
            t0_revenue.extend([row[9]])
            t0_clicks.extend([row[10]])
            t0_pageviews.extend([row[11]])
            t0_avg_cpc.extend([row[13]])
            t0_monthly_budget.extend([row[16]])

I input another file of the same format for metrics at t1. Then I create two dicts for each metric (one at t0 and the other at t1) with the form metric_dict = {'company': 'metric'} like this:

metric = dict(zip(company, metric))

Running simple math on these metrics is problematic however:

percent_change = float(t1_metric_dict[company]) / float(t0_metric_dict[company]) - 1

Returns errors like:

Traceback (most recent call last):
File "report.py", line 104, in <module>
start_revenue_dict[company], end_revenue_dict[company], float(end_revenue_dict[company]) / float(start_revenue_dict[company]) - 1,
ValueError: could not convert string to float: "6.18"

It seems to pick the same number to complain about every time.

I'm fairly certain the error happens in the division as everything works normally if I swap in a placeholder string as the third element.

I also tried using quoting=csv.QUOTE_NONNUMERIC, changing the second line in the first snippet to

reader = csv.reader(file, delimiter=",", quotechar='|', quoting=csv.QUOTE_NONNUMERIC)

Which gets me this error:

Traceback (most recent call last):
File "report.py", line 30, in <module>
reader.next()
ValueError: could not convert string to float: "Type"

I've tried making sure the csv doesn't have any weird cell types (everything is text) even though I doubt it matters. I'd appreciate any help on this one.

------ Update ------

One of the columns in my input file contains email addresses. As an experiment, I removed all @s from the input docs which changed the error message I'm getting to:

Traceback (most recent call last):
File "report.py", line 129, in <module>
unicode_row = [str(item).encode('utf8') for item in utf8_row]
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 2: ordinal not in range(128)

The code it's referencing is the csv out section:

writer = csv.writer(open('output.csv', 'wb'), delimiter=",", quotechar='|')
for utf8_row in report:
    unicode_row = [str(item).encode('utf8') for item in utf8_row]
    writer.writerow(unicode_row)

----- Update #2 -----

As requested, here is the full snippet that's causing problems:

for company in companies_in_both:
report.append([company,
      start_revenue_dict[company], end_revenue_dict[company], float(end_revenue_dict[company]) / float(start_revenue_dict[company]) - 1,
      start_clicks_dict[company], end_clicks_dict[company], float(end_clicks_dict[company]) / float(start_clicks_dict[company]) - 1,
      start_pageviews_dict[company], end_pageviews_dict[company], float(end_pageviews_dict[company]) / float(start_pageviews_dict[company]) - 1,
      start_avg_cpc_dict[company], end_avg_cpc_dict[company], float(end_avg_cpc_dict[company]) / float(start_avg_cpc_dict[company]) - 1,
      start_monthly_budget_dict[company], end_monthly_budget_dict[company], float(end_monthly_budget_dict[company]) / float(start_monthly_budget_dict[company]) - 1])

Upvotes: 0

Views: 589

Answers (1)

John Y
John Y

Reputation: 14559

Are you sure '|' is the right thing to be using for quotechar? I see you have numerous CSV questions, and all your examples use that, so maybe it really is what you want. But it's astoundingly unusual.

Your ValueError message is telling you that you have double-quotes in your data. That is, instead of doing the equivalent of float("6.18"), Python is trying to do float('"6.18"'), and it's choking.

It may be helpful to give a few complete lines of your actual CSV data, surrounding where the error occurs, as viewed with Notepad or equivalent (provided it's not confidential, legally sensitive, etc.).

Upvotes: 1

Related Questions