Christopher
Christopher

Reputation: 691

Inserting data into two columns of csv

My test1111.csv looks similar to this:

Sales #, Date, Tel Number, Comment
393ED3, 5/12/2010, 5555551212, left message
585E54, 6/15/2014, 5555551213, voice mail
585868, 8/16/2010, , number is 5555551214

I have the following code:

import re
import csv
from collections import defaultdict

# Below code places csv entries into dictionary so that they can be parsed
# by column.  Then print statement prints Sales # column.
columns = defaultdict(list)
with open("c:\\test1111.csv", "r") as f:
    reader = csv.DictReader(f)
    for row in reader:
        for (k,v) in row.items():
            columns[k].append(v)

# To print all columns, use: print columns
# To print a specific column, use:  print(columns['ST'])
# Below line takes list output and separates into new lines
sales1 = "\n".join(columns['Sales #'])
print sales1

# Below code searches all columns for a 10 digit number and outputs the
# results to a new csv file.
with open("c:\\test1111.csv", "r") as old, \
     open("c:\\results1111.csv", 'wb') as new:
    for line in old:
    #Regex to match exactly 10 digits
        match = re.search('(?<!\d)\d{10}(?!\d)', line)
        if match:
            match1 = match.group()
            print match1
            new.writelines((match1) + '\n')
        else:
            nomatch = "No match"
            print nomatch
            new.writelines((nomatch) + '\n')

The first section of the code opens the original csv and prints all entries from the Sales # column to stdout with each entry in its own row.

The second section of the code opens the original csv and searches every row for a 10 digit number. When it finds one it writes each one (or no match) to each row of a new csv.

What I would like to now do is to also write the sales column data to the new csv. So ultimately, the sales column data would appear as rows in the first column and the regex data would appear as rows in the second column in the new csv. I have been having trouble getting that to work as the new.writelines won't take two arguments. Can someone please help me with how to accomplish this?

I would like the results1111.csv to look like this:

393ED3, 5555551212
585E54, 5555551213
585868, 5555551214

Upvotes: 1

Views: 182

Answers (2)

Imanol Luengo
Imanol Luengo

Reputation: 15889

Just to point out that in a CSV, unless the spaces are really needed, they shouldn't be there. Your data should look like this:

Sales #,Date,Tel Number,Comment
393ED3,5/12/2010,5555551212,left message
585E54,6/15/2014,5555551213,voice mail
585868,8/16/2010,,number is 5555551214

And, adding a new way of getting the same answer, you can use Pandas data analysis libraries for task involving data tables. It will only be 2 lines for what you want to achieve:

>>> import pandas as pd
# Read data
>>> data = pd.DataFrame.from_csv('/tmp/in.cvs')
>>> data
             Date  Tel Number               Comment
Sales#                                             
393ED3  5/12/2010  5555551212          left message
585E54  6/15/2014  5555551213            voice mail
585868  8/16/2010         NaN  number is 5555551214

# Write data
>>> data.to_csv('/tmp/out.cvs', columns=['Tel Number'], na_rep='No match')

That last line will write to out.cvs the column Tel Number inserting No match when no telephone number is found, exactly what you want. Output file:

Sales#,Tel Number
393ED3,5555551212.0
585E54,5555551213.0
585868,No match

Upvotes: 1

Luigi
Luigi

Reputation: 4129

Starting with the second part of your code, all you need to do is concatenate the sales data within your writelines:

sales_list = sales1.split('\n')

# Below code searches all columns for a 10 digit number and outputs the
# results to a new csv file.

with open("c:\\test1111.csv", "r") as old, \
     open("c:\\results1111.csv", 'wb') as new:
    i = 0 # counter to add the proper sales figure
    for line in old:
        #Regex to match exactly 10 digits
        match = re.search('(?<!\d)\d{10}(?!\d)', line)
        if match:
            match1 = match.group()
            print match1
            new.writelines(str(sales_list[i])+ ',' + (match1) + '\n')
        else:
            nomatch = "No match"
            print nomatch
            new.writelines(str(sales_list[i])+ ',' + (nomatch) + '\n')
        i += 1

Using the counter i, you can keep track of what row you're on and use that to add the corresponding sales column figure.

Upvotes: 1

Related Questions