Reputation: 691
My test1111.csv looks similar to this:
Sales #, Date, Tel Number, Comment
393ED3, 5/12/2010, 5555551212, left message
585E54, 6/15/2014, 5555551213, voice mail
585868, 8/16/2010, , number is 5555551214
I have the following code:
import re
import csv
from collections import defaultdict
# Below code places csv entries into dictionary so that they can be parsed
# by column. Then print statement prints Sales # column.
columns = defaultdict(list)
with open("c:\\test1111.csv", "r") as f:
reader = csv.DictReader(f)
for row in reader:
for (k,v) in row.items():
columns[k].append(v)
# To print all columns, use: print columns
# To print a specific column, use: print(columns['ST'])
# Below line takes list output and separates into new lines
sales1 = "\n".join(columns['Sales #'])
print sales1
# Below code searches all columns for a 10 digit number and outputs the
# results to a new csv file.
with open("c:\\test1111.csv", "r") as old, \
open("c:\\results1111.csv", 'wb') as new:
for line in old:
#Regex to match exactly 10 digits
match = re.search('(?<!\d)\d{10}(?!\d)', line)
if match:
match1 = match.group()
print match1
new.writelines((match1) + '\n')
else:
nomatch = "No match"
print nomatch
new.writelines((nomatch) + '\n')
The first section of the code opens the original csv and prints all entries from the Sales # column to stdout with each entry in its own row.
The second section of the code opens the original csv and searches every row for a 10 digit number. When it finds one it writes each one (or no match) to each row of a new csv.
What I would like to now do is to also write the sales column data to the new csv. So ultimately, the sales column data would appear as rows in the first column and the regex data would appear as rows in the second column in the new csv. I have been having trouble getting that to work as the new.writelines won't take two arguments. Can someone please help me with how to accomplish this?
I would like the results1111.csv to look like this:
393ED3, 5555551212
585E54, 5555551213
585868, 5555551214
Upvotes: 1
Views: 182
Reputation: 15889
Just to point out that in a CSV, unless the spaces are really needed, they shouldn't be there. Your data should look like this:
Sales #,Date,Tel Number,Comment
393ED3,5/12/2010,5555551212,left message
585E54,6/15/2014,5555551213,voice mail
585868,8/16/2010,,number is 5555551214
And, adding a new way of getting the same answer, you can use Pandas data analysis libraries for task involving data tables. It will only be 2 lines for what you want to achieve:
>>> import pandas as pd
# Read data
>>> data = pd.DataFrame.from_csv('/tmp/in.cvs')
>>> data
Date Tel Number Comment
Sales#
393ED3 5/12/2010 5555551212 left message
585E54 6/15/2014 5555551213 voice mail
585868 8/16/2010 NaN number is 5555551214
# Write data
>>> data.to_csv('/tmp/out.cvs', columns=['Tel Number'], na_rep='No match')
That last line will write to out.cvs
the column Tel Number
inserting No match
when no telephone number is found, exactly what you want. Output file:
Sales#,Tel Number
393ED3,5555551212.0
585E54,5555551213.0
585868,No match
Upvotes: 1
Reputation: 4129
Starting with the second part of your code, all you need to do is concatenate the sales data within your writelines
:
sales_list = sales1.split('\n')
# Below code searches all columns for a 10 digit number and outputs the
# results to a new csv file.
with open("c:\\test1111.csv", "r") as old, \
open("c:\\results1111.csv", 'wb') as new:
i = 0 # counter to add the proper sales figure
for line in old:
#Regex to match exactly 10 digits
match = re.search('(?<!\d)\d{10}(?!\d)', line)
if match:
match1 = match.group()
print match1
new.writelines(str(sales_list[i])+ ',' + (match1) + '\n')
else:
nomatch = "No match"
print nomatch
new.writelines(str(sales_list[i])+ ',' + (nomatch) + '\n')
i += 1
Using the counter i
, you can keep track of what row you're on and use that to add the corresponding sales column figure.
Upvotes: 1