Alan
Alan

Reputation: 13

How can i calculate the percentage of strings that have a certain value in a field

I have a CSV file separated by commas. I need to read the file, determine a string that have a certain value(say Blue) in a field (colour) and calculate the percentage of string that fit the criteria.

My code so far is as follows:

myfile = open('3517315a.csv','r')

myfilecount = 0

linecount = 0

firstline = True

for line in myfile:

       if firstline:
        firstline = False
        continue
fields = line.split(',')

    linecount += 1
    count = int(fields[0])
    colour = str(fields[1])
    channels = int(fields[2])
    code = str(fields[3])
    correct = str(fields[4])
    reading = float(fields[5])

I don't know how I can set the condition and calculate the percentage.

Upvotes: 0

Views: 483

Answers (3)

Log2
Log2

Reputation: 141

If you are willing to use third party modules, then I highly suggest that you use Pandas. The code would roughly be:

import pandas as pd

df = pd.read_csv("my_data.csv")
blues = len(df[df.colour == "blue"])
percentage = blues / len(df)
print(f"{percentage}% of the colours are blue")

Upvotes: 0

Fredrick Brennan
Fredrick Brennan

Reputation: 7357

Try this :) It is more easily configurable than the other answer, and will work on all types of CSV files thanks to the csv module. Tested with Python 3.6.1.

import csv
import io # needed because our file is not really a file

CSVFILE = """name,occupation,birthyear
John,Salesman,1992
James,Intern,1997
Abe,Salesman,1983
Michael,Salesman,1994"""

f = io.StringIO(CSVFILE) # needed because our file is not really a file

# This is the name of the row we want to know about
our_row = 'occupation'
# If we want to limit the output to one value, put it here.
our_value = None # For example, try 'Intern'
# This will hold the total number of rows
row_total = 0

totals = dict()

for row in csv.DictReader(f):
    v = row[our_row]
    # If we've already come across a row with this value before, add 1 to it
    if v in totals:
        totals[v] += 1
    else: # Set this row's total value to 1
        totals[v] = 1

    row_total += 1

for k, v in totals.items():
    if our_value:
        if k != our_value: continue

    print("{}: {:.2f}%".format(k, v/row_total*100))

Output:

Salesman: 75.00%
Intern: 25.00%

Upvotes: 1

eega
eega

Reputation: 560

Well, there basically are three steps to this:

  1. get number of lines in file. You already do this with linecount
  2. get number of occurences of your condition. Let's take colour: you already extracted the colour, now you only have to compare it to the value you are looking for, e.g. if colour == "Blue"
  3. Calculate the percentage, which is the occurences / linecount

It could look like this:

myfile = open('3517315a.csv','r')

myfilecount = 0

linecount = 0
occurences = 0

firstline = True

for line in myfile:

    if firstline:
        firstline = False
        continue

    fields = line.split(',')

    linecount += 1

    count = int(fields[0])
    colour = str(fields[1])
    channels = int(fields[2])
    code = str(fields[3])
    correct = str(fields[4])
    reading = float(fields[5])

    if colour == 'Blue':
        occurences_blue += 1

percentage_blue = occurences_blue / linecount

This is a very basic example, though. In any case, you probably should use the Python csv library to read the fields from the csv, as suggested in a comment to your post (https://docs.python.org/2/library/csv.html). I would also expect that there are libraries out there, which could solve your problem more efficiently.

Upvotes: 0

Related Questions