Reputation: 854

Python Adding multiple data points in a dict from CSV file

I have a CSV file that looks like:

CountryCode, NumberCalled, CallPrice, CallDuration
BS,+1234567,0.20250,29
BS,+19876544,0.20250,1
US,+121234,0.01250,4
US,+1543215,0.01250,39
US,+145678,0.01250,11
US,+18765678,None,0

I want to be able to analyse the file to work some statistics from the data:

CountryCode, NumberOfTimesCalled, TotalPrice, TotalCallDuration
US, 4, 1.555, 54

At the moment, I have dict thats setup:

CalledStatistics = {}

When I read each line from the CSV, whats the best way to put the data into the dict? :

CalledStatistics['CountryCode'] = {'CallDuration', 'CallPrice', 'NumberOfTimesCalled'}

Would adding the second US line overwrite the first line or would the data be added based on the key 'CountryCode' ?

Upvotes: 0

Answers (3)

zezollo

Reputation: 5017

Each of these calls:

CalledStatistics['CountryCode'] = {'CallDuration', 'CallPrice', 'NumberOfTimesCalled'}

would overwrite the call before.

In order to calculate the sums you need, you could use a dict of dicts. Like in a for loop where you have your data in these variables: country_code, call_duration, call_price and where you would store the data in collected_statistics: (EDIT: added the first line in order to turn call_price into 0 if it's recorded as None in the data; this piece of code is meant to work with consistent data, like integers only, if there are possibly other types of data, they need to be turned into integers [or any numbers of the same type] before python can sum them)

call_price = call_price if call_price != None else 0

if country_code not in collected_statistics:
    collected_statistics[country_code] = {'CallDuration' : [call_duration],
                                          'CallPrice' : [call_price]}
else:
    collected_statistics[country_code]['CallDuration'] += [call_duration]
    collected_statistics[country_code]['CallPrice'] += [call_price]

and after the loop, for each country_code:

number_of_times_called[country_code] = len(collected_statistics[country_code]['CallDuration']

total_call_duration[country_code] = sum(collected_statistics[country_code]['CallDuration'])
total_price[country_code] = sum(collected_statistics[country_code]['CallPrice'])

OK, so finally here is a complete working script handling the example you gave:

#!/usr/bin/env python3

import csv
import decimal

with open('CalledData', newline='') as csvfile:
    csv_r = csv.reader(csvfile, delimiter=',', quotechar='|')

    # btw this creates a dict, not a set
    collected_statistics = {}

    for row in csv_r:

        [country_code, number_called, call_price, call_duration] = row

        # Only to avoid the first line, but would be better to have a list of available
        # (and correct) codes, and check if the country_code belongs to this list:
        if country_code != 'CountryCode':

            call_price = call_price if call_price != 'None' else 0

            if country_code not in collected_statistics:
                collected_statistics[country_code] = {'CallDuration' : [int(call_duration)],
                                                      'CallPrice' : [decimal.Decimal(call_price)]}
            else:
                collected_statistics[country_code]['CallDuration'] += [int(call_duration)]
                collected_statistics[country_code]['CallPrice'] += [decimal.Decimal(call_price)]


    for country_code in collected_statistics:
        print(str(country_code) + ":")
        print("number of times called: " + str(len(collected_statistics[country_code]['CallDuration'])))
        print("total price: " + str(sum(collected_statistics[country_code]['CallPrice'])))
        print("total call duration: " + str(sum(collected_statistics[country_code]['CallDuration'])))

using CalledData as a file having the exact same content you provided, it outputs:

$ ./test_script
BS:
number of times called: 2
total price: 0.40500
total call duration: 30
US:
number of times called: 4
total price: 0.03750
total call duration: 54

Upvotes: 2

catalesia

Reputation: 3378

Your approach could be slightly different. Just read the file, make it a list (readlines.strip("\n"), split(",").

Forget about the first row and the last (will be empty most likely, test). Then you can make the dict using an example @zezollo used and simply add values by key of the dict you would create. Make sure all the values you are adding, after you make it a list of lists, is the same Type.

Nothing like a hard work, you'll remember that case for long ;)

Test, test, test on mock examples. And read Python help and docs. It's brilliant.

Upvotes: 0

Arctelix

Reputation: 4576

Dictionaries can contain lists and lists of dictionaries, so you can achieve your desired structure as follows:

CalledStatistics['CountryCode'] =[ {
    'CallDuration':cd_val, 
    'CallPrice':cp_val,
    'NumberOfTimesCalled':ntc_val } ]

Then you can add values like this:

for line in lines:
    parts = line.split(',')
    CalledStatistics[parts.pop(0)].append({
        'CallDuration':parts[0], 
        'CallPrice':parts[1],
        'NumberOfTimesCalled':parts[2] })

By making each countryCode a list, you can add as many unique dicts as you want to each countryCode.

The pop(i) method, returns the value and mutates the list so all that remains is the data you desire for the dict values. Thats why we pop index 0 and add index 0 - 2 to the dict.

Upvotes: 0

Python Adding multiple data points in a dict from CSV file

Answers (3)

Related Questions