user3296793
user3296793

Reputation: 270

how to parse a txt file to csv and modify formatting

Is there a way I can use python to take my animals.txt file results and convert it to csv and format it differently?

Currently the animals.txt file looks like this:

ID:- 512
NAME:- GOOSE
PROJECT NAME:- Random
REPORT ID:- 30321
REPORT NAME:- ANIMAL
KEYWORDS:- ['"help,goose,Grease,GB"']


ID:- 566
NAME:- MOOSE
PROJECT NAME:- Random
REPORT ID:- 30213
REPORT NAME:- ANIMAL
KEYWORDS:- ['"Moose, boar, hansel"']

I would like the CSV file to present it as:

ID, NAME, PROJECT NAME, REPORT ID, REPORT NAME, KEYWORDS

Followed by the results underneath each header

Here is a script I have wrote:

import re
import csv
with open("animals.txt") as f: text = f.read()

data = {}
keys = ['ID', 'NAME', 'PROJECT NAME', 'REPORT ID', 'REPORT NAME', 'KEYWORDS']
for k in keys:
    data[k] = re.findall(r'%s:- (.*)' % k, text)


csv_file = 'out.csv'

with open(csv_file, 'w') as csvfile:
    writer = csv.DictWriter(csvfile, fieldnames=keys) 
    writer.writeheader()
    for x in data:
        writer.writerow(x)

Upvotes: 1

Views: 1676

Answers (3)

knh190
knh190

Reputation: 2882

An easy way to do is parsing using regex and store them in a dict, just before you write the final csv:

import re

# `text` is your input text

data = {}
keys = ['ID', 'NAME', 'PROJECT NAME', 'REPORT ID', 'REPORT NAME', 'KEYWORDS']
for k in keys:
    data[k] = re.findall(r'%s:- (.*)' % k, text)

And to CSV:

import csv

csv_file = 'out.csv'

with open(csv_file, 'w') as csvfile:
    writer = csv.writer(csvfile, quoting=csv.QUOTE_NONE, escapechar='\\')
    writer.writerow(data.keys())
    for i in range(len(data[keys[0]])):
        writer.writerow([data[k][i] for k in keys])

Output in csv:

ID,NAME,PROJECT NAME,REPORT ID,REPORT NAME,KEYWORDS
512,GOOSE,Random,30321,ANIMAL,['\"help\,goose\,Grease\,GB\"']
566,MOOSE,Random,30213,ANIMAL,['\"Moose\, boar\, hansel\"']

Note that I used re.M multiline mode since there's a trick in your text, preventing matching ID twice! Also the default write rows needed to be twisted.

Also uses \ to escape the quote.

Upvotes: 3

Xenobiologist
Xenobiologist

Reputation: 2151

That's the code in Autoit (www.autoitscript.com)

Global $values_A = StringRegExp(FileRead("json.txt"), '[ID|NAME|KEYWORDS]:-\s(.*)?', 3)
For $i = 0 To UBound($values_A) - 1 Step +6
    FileWrite('out.csv', $values_A[$i] & ',' & $values_A[$i + 1] & ',' & $values_A[$i + 2] & ',' & $values_A[$i + 3] & ',' & $values_A[$i + 4] & ',' & $values_A[$i + 5] & @CRLF)
Next

Upvotes: -2

Toni Sredanović
Toni Sredanović

Reputation: 2412

This should work:

fname = 'animals.txt'

with open(fname) as f:
    content = f.readlines()

content = [x.strip() for x in content]

output = 'ID, NAME, PROJECT NAME, REPORT ID, REPORT NAME, KEYWORDS\n'

line_output = ''
for i in range(0, len(content)):
    if content[i]:
        line_output += content[i].split(':-')[-1].strip() + ','
    elif not content[i] and not content[i - 1]:
        output += line_output.rstrip(',') + '\n'
        line_output = ''

output += line_output.rstrip(',') + '\n'

print(output)

Upvotes: 1

Related Questions