user8151026
user8151026

Reputation:

How to convert this text file to csv?

I try analyze text file with data - columns, and records. My file:

Name     Surname    Age    Sex      Grade
Chris      M.        14     M       4
Adam       A.        17     M
Jack       O.               M       8

The text file has some empty data. As above. User want to show Name and Grade:

import csv

with open('launchlog.txt', 'r') as in_file:
    stripped = (line.strip() for line in in_file)
    lines = (line.split() for line in stripped if line)
    with open('log.txt', 'w') as out_file:
        writer = csv.writer(out_file)
        writer.writerow(('Name', 'Surname', 'Age', 'Sex', 'Grade'))
        writer.writerows(lines)

log.txt :

Chris,M.,14,M,4
Adam,A.,17,M
Jack,O.,M,8

How to empty data insert a "None" string? For example:

Chris,M.,14,M,4
Adam,A.,17,M,None
Jack,O.,None,M,8

What would be the best way to do this in Python?

Upvotes: 0

Views: 714

Answers (4)

martineau
martineau

Reputation: 123413

Here's something in Pure Python™ that seems to do what you want, at least on the sample data file in your question.

In a nutshell what it does is first determine where each of the field names in column header line start and end, and then for each of the remaining lines of the file, does the same thing getting a second list which is used to determine what column each data item in the line is underneath (which it then puts in its proper position in the row that will be written to the output file).

import csv

def find_words(line):
    """ Return a list of (start, stop) tuples with the indices of the
        first and last characters of each "word" in the given string.
        Any sequence of consecutive non-space characters is considered
        as comprising a word.
    """
    line_len = len(line)
    indices = []
    i = 0
    while i < line_len:
        start, count = i, 0
        while line[i] != ' ':
            count += 1
            i += 1
            if i >= line_len:
                break
        indices.append((start, start+count-1))

        while i < line_len and line[i] == ' ':  # advance to start of next word
            i += 1

    return indices


# convert text file with missing fields to csv
with open('name_grades.txt', 'rt') as in_file, open('log.csv', 'wt', newline='') as out_file:
    writer = csv.writer(out_file)
    header = next(in_file)  # read first line
    fields = header.split()
    writer.writerow(fields)

    # determine the indices of where each field starts and stops based on header line
    field_positions = find_words(header)

    for line in in_file:
        line = line.rstrip('\r\n')  # remove trailing newline
        row = ['None' for _ in range(len(fields))]
        value_positions = find_words(line)
        for (vstart, vstop) in value_positions:
            # determine what field the value is underneath
            for i, (hstart, hstop) in enumerate(field_positions):
                if vstart <= hstop and hstart <= vstop:  # overlap?
                    row[i] = line[vstart:vstop+1]
                    break  # stop looking

        writer.writerow(row)

Here's the contents of the log.csv file it created:

Name,Surname,Age,Sex,Grade
Chris,C.,14,M,4
Adam,A.,17,M,None
Jack,O.,None,M,8

Upvotes: 1

ramesh
ramesh

Reputation: 1207

Without using pandas:

Edited based on your comment, I hard coded this solution based on your data. This will not work for the rows doesn't have Surname column.
I'm writing out Name and Grade since you only need those two columns.

o = open("out.txt", 'w')
with open("inFIle.txt") as f:
    for lines in f:
        lines = lines.strip("\n").split(",")
        try:
            grade = int(lines[-1])
            if (lines[-2][-1]) != '.':
                o.write(lines[0]+","+ str(grade)+"\n")
        except ValueError:
            print(lines)
o.close()

Upvotes: 0

SteveJ
SteveJ

Reputation: 3313

I would use baloo's answer over mine -- but if you just want to get a feel for where your code went wrong, the solution below mostly works (there is a formatting issue with the Grade field, but I'm sure you can get through that.) Add some print statements to your code and to mine and you should be able to pick up the differences.

import csv

<Old Code removed in favor of new code below>

EDIT: I see your difficulty now. Please try the below code; I'm out of time today so you will have to fill in the writer parts where the print statement is, but this will fulfill your request to replace empty fields with None.

import csv

with open('Test.txt', 'r') as in_file:
    with open('log.csv', 'w') as out_file:
        writer = csv.writer(out_file)
        lines = [line for line in in_file]
        name_and_grade = dict()
        for line in lines[1:]:
            parts = line[0:10], line[11:19], line[20:24], line[25:31], line[32:]
            new_line = list()
            for part in parts:
                val = part.replace('/n','')
                val = val.strip()
                val = val if val != '' else 'None'
                new_line.append(val)
            print(new_line)

Upvotes: 0

baloo
baloo

Reputation: 527

Use pandas:

import pandas
data=pandas.read_fwf("file.txt")

To get your dictionnary:

data.set_index("Name")["Grade"].to_dict()

Upvotes: 2

Related Questions