Reputation: 391

Python issue outputting multiline string into single csv.writer row

The below code works, but assigns each newline of a multiline string to a new row vs. the desired state of one cell.

While researching a possible solutions I read:

That I should try enclose the string in double quotes
That replacing newline with carriage return helps
Parsing as a list/tuple via [] declaration

Having tried all the above, I'm thoroughly confused as to why this isn't working. If I remove newline it assigns everything to a single cell in the row but then it's hard to read.

Is there no way to preserve format when outputting a string with csv.writer?

def run_rip():
os.chdir('/REGRIPPERDIR/RegRipper2.8')
for k in ntDict:
    run_command = "".join(["./rip.pl", " -r /APATH/NTUSER.DAT -p ", str(k)])
    process = subprocess.Popen(run_command,
                               shell=True,
                               stdout=subprocess.PIPE,
                               stderr=subprocess.PIPE)
    out, err = process.communicate()
    pattern = re.compile('lastwrite|(\d{2}:\d{2}:\d{2})|alert|trust|Value')
    grouping = re.compile('(?P<first>.+?)(\n)(?P<second>.+?)([\n]{2})(?P<rest>.+[\n])', re.MULTILINE | re.DOTALL)
    if pattern.findall(out):
        match = re.search(grouping, out)
        first = match.group('first')
        second =match.group('second')
        rest = ('"%s' % os.linesep.join([s for s in match.group('rest').splitlines() if s]))
        rest = rest.replace('\n','\r\n')
        headers = ['Name', 'Description', 'Found']
        f = csv.writer(open('/APATH/ntuser.csv', 'ab'),
                       dialect=csv.excel,
                       delimiter='\t')
        f.writerow(headers)
        f.writerow((first, second, rest))
        print(out)
    ntDict.popitem(last=False)

run_rip()

Sample Output:

/edit: I was requested in the comments below to post a sample of the third string rest as it is collected. The below text is what would be passed to csv.writer.

Baseline\n #First string as defined by the regex
(All) scans a hive\n #Second String as defined by the regex

Total values checked    : 64\n   #This and below is the rest string
Number of binary value lengths : 11\n
...\n
Value -5: 176 bytes [# times: 1]\n
Value -4: 712 bytes [# times: 5]\n

Desired State:

Upvotes: 1

Answers (4)

tagny

Reputation: 9

You can simply use \n to separate the items in the multiline cell. Just make sure the whole cell content appears between double-quote:

f = open(filename, "w", encoding='utf-8')

f.write('a,b,"c\nd",e\n')

This example writes a row in a csv file where the third cell has 2 lines c and d.

Upvotes: 1

Hussain Bohra

Reputation: 1005

this is how I solved this problem.

Input Data:

('[email protected]', 'firstName', 'lastName', 'Address1', 'Address1
Address2', 'IP1
IP2
IP3', 'PhoneNumber')

Desired CSV Format:

Here is a Python code to get the same:

try:
    cs.execute(sql)
    row = cs.fetchone()
    while row is not None:
        email = row[0]
        filename = '{0}.csv'.format(email)
        with open(filename, "w", encoding='utf-8-sig', newline='') as fp:
            writer = csv.writer(fp)
            writer.writerow(('REQUESTER EMAIL', 'FIRST NAME', 'LAST NAME', 'BILLING ADDRESSES', 'SHIPPING ADDRESSES', 'IP ADDRESSES', 'PHONE NUMBER'))
            writer.writerow((
                row[0],
                row[1],
                row[2],
                row[3],
                row[4],
                row[5],
                row[6],
            ))
        row = cs.fetchone()
finally:
    cs.close()
ctx.close()

See this line (#7):

open(filename, "w", encoding='utf-8-sig', newline='')

setting up a unicode to 'utf-8-sig' is doing all trick.

Thanks, Hussain Bohra

Upvotes: 0

ImNotLeet

Reputation: 391

With the guidance of the comments I found my answer. Simply excel borks up formatting (for reasons covered in the comments). However when opened in LibreOffice the formatting is retained.

The suggested thread in the comments (Importing CSV with line breaks in Excel 2007) has a purposed workaround which includes quoting the actual newlines vs. quoting the entire string, which is what I was doing.

Upvotes: 0

demented hedgehog

Reputation: 7548

Not an answer... but I want code formatting.

import csv
x = ("1", "2\n3", "4")
f = csv.writer(open('foo', 'w'),
                   dialect=csv.excel,
                   delimiter='\t')
f.writerow(x)

Produces this:

$ less foo
1       "2
3"      4

And that's "valid" tab separated csv.. it's just that excel doesn't handle it "properly". Where things in quotes are really more implementation issues since it's not a standardized format. The extra double quotes around the 23 are kind of annoying.

Check out https://pypi.python.org/pypi/xlwt (pre xml excel formats) or http://xlsxwriter.readthedocs.org/ (xml excel formats) for 3rd party libraries to write excel directly.

Upvotes: 2

Python issue outputting multiline string into single csv.writer row

Answers (4)

Related Questions