Reputation: 5

I am trying to make an array for each column of a CSV file,

I have an excel document that I have exported as CSV. It looks like this:

"First Name","Last Name","First Name","Last Name","Address","City","State"
"Bob","Robertson","Roberta","Robertson","123 South Street","Salt Lake City","UT"
"Leo","Smart","Carter","Smart","827 Cherry Street","Macon","GA"
"Mats","Lindgren","Lucas","Lindgren","237 strawberry xing","houston","tx"

I have a class called "Category" that has a name variable. My code makes a category for each of the first line strings, but now I need to add each item to the column that it is supposed to go in.

import xlutils
from difflib import SequenceMatcher
from address import AddressParser, Address
from nameparser import HumanName
import xlrd
import csv

class Category:
    name = ""
    contents = []
    index = 0

columns = []
alltext = ""

with open('test.csv', 'rb') as csvfile:
    document = csv.reader(csvfile, delimiter=',', quotechar='\"')
    for row in document:
        alltext = alltext + ', '.join(row) + "\n"

    splitText = alltext.split('\n')


    categoryNames = splitText[0].split(', ')
    ixt = 0
    for name in categoryNames:
        thisCategory = Category()
        thisCategory.name = name
        thisCategory.index = ixt
        columns.append(thisCategory)
        ixt = ixt + 1


    for line in splitText:
        if(line != splitText[0] and len(line) != 0):
            individualItems = line.split(', ')
            for index, item in enumerate(individualItems):
                if(columns[index].index == index):
                    print(item + " (" + str(index) + ") is being sent to " + columns[index].name)
                    columns[index].contents.append(item)
    for col in columns:
        print("-----" + col.name + " (" + str(col.index) + ")-----")
        for stuff in col.contents:
            print(stuff)

As the code runs, it gives an output for each item that says:

Bob (0) is being sent to First Name
Robertson(1) is being sent to Last Name

Which is what it should be doing. Every item says that it is being sent to the correct category. At the end, however, instead of having each item be in the category that it claims, every category has every item, and instead of this:

-----First Name-----
Bob
Roberta
Leo
Carter
Mats
Lucas

And so on and so forth, for each of the categories. I get this:

-----First Name-----
Bob
Robertson
Roberta
Robertson
123 South Street
Salt Lake City
UT
Leo
Smart
Carter
Smart
827 Cherry Street
Macon
GA
Mats
Lindgren
Lucas
Lindgren
237 strawberry xing
houston
tx

I don't know what is going wrong. There is nothing in between those two lines of code that could possibly be messing it up.

Upvotes: 0

Answers (3)

nambii

Reputation: 3

Try using below statement for reading csv.

import csv
data = []
with open("test.csv") as f :
    document = csv.reader(f)
    for line in document :
        data.append(line)

wherein data[0] will have all category names

Upvotes: 0

tdelaney

Reputation: 77337

The problem is that you defined class level variables for Category, not instance variables. That was mostly harmless for

thisCategory.name = name
thisCategory.index = ixt

because that created instance variables for each object that mask the class variable. But

columns[index].contents.append(item)

is different. It got the single class level contents list and added the data regardless of which instance happened to be active at the time.

The solution is to use instance variables created in __init__. Also, you were doing too much work reassembling things into strings then breaking them out again. Just process the columns as the rows are read.

#import xlutils
#from difflib import SequenceMatcher
#from address import AddressParser, Address
#from nameparser import HumanName
#import xlrd
import csv

class Category:

    def __init__(self, index, name):
        self.name = name
        self.index = index
        self.contents = []

columns = []
alltext = ""

with open('test.csv', 'r', newline='') as csvfile:
    document = csv.reader(csvfile, delimiter=',', quotechar='\"')
    # create categories from first row
    columns = [Category(index, name) 
        for index, name in enumerate(next(document))]
    # add columns for the rest of the file
    for row in document:
        if row:
            for index, cell in enumerate(row):
                columns[index].contents.append(cell)

for col in columns:
    print("-----" + col.name + " (" + str(col.index) + ")-----")
    for stuff in col.contents:
        print(stuff)

Upvotes: 1

Alan

Reputation: 3042

3 comments:

You aren't taking into account the first field - you take an empty string alltext = "" and the first thing you do is add a comma. This is pushing everything one field over. You would need to test if you are on the first row.
You are opening a csv ... then twisting it back to a text file. This is looks like it is because a csv will field-separate the values and you want to do this manually later on. If you open the file as a text file in the first place and read it using read, you don't need the first part of the code (unless you have done something very strange to your csv; since we don't have a sample to examine I can't comment on that).
```
with open('test.csv', 'r') as f:
    document = f.read()
```

will give you the correctly formatted alltext string.

This is a good use-case for csv.DictReader, which will give you the fields in a structured format. See this StackOverflow question as an example and the documentation.

Upvotes: 0

I am trying to make an array for each column of a CSV file,

Answers (3)

Related Questions