Reputation: 664
sorry for the stupid question. I am not sure if I am just tired or what, but I am having a hard time trying to figure out the logic of solving this problem.
I have a csv that looks like this:
Company,CompanyName,
Website,WebsiteName ,
Website, WebsiteName2,
Email, emailData,
Company,NextCompanyName,
Website,websiteName,
Website, WebsiteName2,
Company,NextCompanyName,
Name,PersonName,
Website,websiteName,
as you can see, it is pretty nasty data. What I would like to do is read in the entire CSV, and separate each line by CompanyName and try to organize as much data as possible. Sometimes the company has a person's name, sometimes it has multiple websites, sometimes an email, and sometimes not.
So my desired output would be: Company Name, Person's Name, Email Address, Web1, Web2, etc
The good news is that all the data has a separator on each row (Company, Website, Name, etc). What I am wanting to do is read through the CSV, and when it finds a row that looks like Company, CompanyName that it starts a new row and sorts the data (Name to Name Column, email to emailColumn, etc until it runs into another row that looks like Company, CompanyName.
I dont need help reading / writing to the csv. I am looking for help on how to properly iterate over the data and sort the data to where it needs to be.
Thanks for any suggestions you can give me
Upvotes: 1
Views: 65
Reputation: 77337
You can check for a record start condition as you iterate the lines of the file. Record each key/value pair in a dict
and when you see the start, you know the existing record is complete. You can make the values in your record dict a list and append new values as you find them.
from collections import defaultdict
import csv
import re
filename = 'mytest.csv'
# test data
open('mytest.csv', 'w').write("""Company,CompanyName,
Website,WebsiteName ,
Website, WebsiteName2,
Email, emailData,
Company,NextCompanyName,
Website,websiteName,
Website, WebsiteName2,
Company,NextCompanyName,
Name,PersonName,
Website,websiteName,""")
# will hold dict for each company
records = []
with open(filename, newline='') as in_fp:
record = defaultdict(list)
for row in csv.reader(in_fp):
if len(row) >= 2:
if row[0].strip() == "Company" and "Company" in record:
# found new company... record old as long as it has data
records.append(record)
record = defaultdict(list)
record[row[0].strip()].append(row[1].strip())
for record in records:
print('----')
print(record)
Upvotes: 1
Reputation: 302
You could use a simple condition, and sort everything into lists, or even a single dictionnary (although that is a little more complicated I think, but not much)
companyList = []
with open("foo.csv", "r") as f:
for line in f:
if "Company" in line:
companyList.append(line.split(',')[1])
with a list for each of your rows, then rebuild your csv how you want it to be, and write it.
Upvotes: 0