Reputation: 2386

Read in and format a text file with python 2.x

If I have data in a text file that looks like:

# this is a header
# and so is this
#
alligator 27.2 83.4
bear 23.9 90.2
cat 12.56 0.98
dog 15.97 0.88884
...
...

...I know I can read that data in (make a list of lists corresponding to the lines of data) by using the following block of code:

file1 = 'tmp.txt'

file1_data = []
data_input = open(file1,'r')
for line in data_input:
    if "#" not in line:
        line = line.strip().split()
        first_col_datum = line[0]
        second_col_datum = float(line[1])
        third_col_datum = float(line[2])
        file1_data.append([first_col_datum,second_col_datum,third_col_datum])
data_input.close()

...but my intuition tells me there is a much much more elegant way to complete this task. Basically I would like to read in the file line by line, ignore '#'s, and supply the command with a 'format' for each element in the line (like ["%s","%0.6f","%0.6f","%0.6f","%i"] or something...I will always know this a priori). What is the best practice to do this?

Upvotes: 2

Answers (3)

Padraic Cunningham

Reputation: 180540

file1_data = []
with open(file1) as data_input: # with automatically closes your files
    # skip headers 
    next(data_input), next(data_input), next(data_input)
    for line in data_input:
        # unpack 
        first_col_datum, second_col_datum, third_col_datum = line.split()
        file1_data.append([first_col_datum,float(second_col_datum), float(third_col_datum)])

Output:

[['alligator', 27.2, 83.4], ['bear', 23.9, 90.2], ['cat', 12.56, 0.98], ['dog', 15.97, 0.88884]]

Or use itertools.islice to skip the headers:

from itertools import islice

with open(file1) as data_input:
    for line in islice(data_input,3,None):
        first_col_datum, second_col_datum, third_col_datum = line.split()
        file1_data.append([first_col_datum,float(second_col_datum),float(third_col_datum)])

print(file1_data)
[['alligator', 27.2, 83.4], ['bear', 23.9, 90.2], ['cat', 12.56, 0.98], ['dog', 15.97, 0.88884]]

Not sure I fully understand the formatting part or what you want to do with it but if you want to format use str.format:

([first_col_datum, "{:6f}".format(float(second_col_datum)),"{:6f}".format(float(third_col_datum))])

If you were trying to ignore lines starting with # using an if statement you should use str.startswith:

if not line.startswith("#")

Not sure where in your question it says you want to write the data to a file but if you do:

from itertools import islice

with open(file1) as data_input, open("output.txt","w") as out:
    for line in islice(data_input,3,None):
        first_col_datum, second_col_datum, third_col_datum = line.split()
        out.write("{} {:6f} {:6f}\n".format(first_col_datum,float(second_col_datum), float(third_col_datum)))

Upvotes: 1

thiruvenkadam

Reputation: 4260

The simplest method by which we can do this is through lambda in list comprehension or lambda with map function

desired_list = lambda str_list: [str_list[0], float(str_list[1]), float(str_list[2])]
# With list comprehension
with open(file1) as fo:
    output_list = [desired_list(content.strip().split(" ", 3) for content in fo.read().split("\n") if content and '#' not in content]

# With filter and map function
output_list = []
with open(file1) as fo:
    fitered_list = filter(lambda x: if x and '#' not in x, fo.read().split("\n"))
    output_list = map(desired_list, filtered_list)

I would prefer putting the logic into a function and calling it rather than using lambda, much like Padraic Cunningham.

def desired_list(line):
    if not line.strip() and '#' in line.strip():
        return None
    line_list = line.split(" ", 3)
    return [line_list[0], float(line_list[1]), float(line_list[2])]

with open(file1) as fo:
    file_contents = fo.read().split("\n")
    output_list = filter(None, map(desired_list, file_contents))

This gives control over the logic pretty much than the other two methods.

Upvotes: 1

vks

Reputation: 67998

If you want to write in the middle of the file use

fileinput module.

import fileinput
for line in fileinput.input("C:\\Users\\Administrator\\Desktop\\new.txt",inplace=True):
    if not re.match(r"^#.*$",line):
        #do the formatting
        print "something", #print("something", end ="") for python 3

Done in a few lines

remember whatever you print that will go in the file.So you have to read and print every line and modify whichever you want to replace.Also use print "asd", the , at the end is important as It will prevent print from putting a newline there.

Now you dont watch lines starting with#`.

So add the condition.

if not re.match(r"^#.*$",line):
    #do the formatting and print

Upvotes: 2

Read in and format a text file with python 2.x

Answers (3)

Related Questions