Reputation: 2386
If I have data in a text file that looks like:
# this is a header
# and so is this
#
alligator 27.2 83.4
bear 23.9 90.2
cat 12.56 0.98
dog 15.97 0.88884
...
...
...I know I can read that data in (make a list of lists corresponding to the lines of data) by using the following block of code:
file1 = 'tmp.txt'
file1_data = []
data_input = open(file1,'r')
for line in data_input:
if "#" not in line:
line = line.strip().split()
first_col_datum = line[0]
second_col_datum = float(line[1])
third_col_datum = float(line[2])
file1_data.append([first_col_datum,second_col_datum,third_col_datum])
data_input.close()
...but my intuition tells me there is a much much more elegant way to complete this task. Basically I would like to read in the file line by line, ignore '#'s, and supply the command with a 'format' for each element in the line (like ["%s","%0.6f","%0.6f","%0.6f","%i"] or something...I will always know this a priori). What is the best practice to do this?
Upvotes: 2
Views: 174
Reputation: 180391
file1_data = []
with open(file1) as data_input: # with automatically closes your files
# skip headers
next(data_input), next(data_input), next(data_input)
for line in data_input:
# unpack
first_col_datum, second_col_datum, third_col_datum = line.split()
file1_data.append([first_col_datum,float(second_col_datum), float(third_col_datum)])
Output:
[['alligator', 27.2, 83.4], ['bear', 23.9, 90.2], ['cat', 12.56, 0.98], ['dog', 15.97, 0.88884]]
Or use itertools.islice to skip the headers:
from itertools import islice
with open(file1) as data_input:
for line in islice(data_input,3,None):
first_col_datum, second_col_datum, third_col_datum = line.split()
file1_data.append([first_col_datum,float(second_col_datum),float(third_col_datum)])
print(file1_data)
[['alligator', 27.2, 83.4], ['bear', 23.9, 90.2], ['cat', 12.56, 0.98], ['dog', 15.97, 0.88884]]
Not sure I fully understand the formatting part or what you want to do with it but if you want to format use str.format:
([first_col_datum, "{:6f}".format(float(second_col_datum)),"{:6f}".format(float(third_col_datum))])
If you were trying to ignore lines starting with #
using an if statement you should use str.startswith
:
if not line.startswith("#")
Not sure where in your question it says you want to write the data to a file but if you do:
from itertools import islice
with open(file1) as data_input, open("output.txt","w") as out:
for line in islice(data_input,3,None):
first_col_datum, second_col_datum, third_col_datum = line.split()
out.write("{} {:6f} {:6f}\n".format(first_col_datum,float(second_col_datum), float(third_col_datum)))
Upvotes: 1
Reputation: 4250
The simplest method by which we can do this is through lambda in list comprehension or lambda with map function
desired_list = lambda str_list: [str_list[0], float(str_list[1]), float(str_list[2])]
# With list comprehension
with open(file1) as fo:
output_list = [desired_list(content.strip().split(" ", 3) for content in fo.read().split("\n") if content and '#' not in content]
# With filter and map function
output_list = []
with open(file1) as fo:
fitered_list = filter(lambda x: if x and '#' not in x, fo.read().split("\n"))
output_list = map(desired_list, filtered_list)
I would prefer putting the logic into a function and calling it rather than using lambda, much like Padraic Cunningham.
def desired_list(line):
if not line.strip() and '#' in line.strip():
return None
line_list = line.split(" ", 3)
return [line_list[0], float(line_list[1]), float(line_list[2])]
with open(file1) as fo:
file_contents = fo.read().split("\n")
output_list = filter(None, map(desired_list, file_contents))
This gives control over the logic pretty much than the other two methods.
Upvotes: 1
Reputation: 67968
If you want to write in the middle of the file use
fileinput module
.
import fileinput
for line in fileinput.input("C:\\Users\\Administrator\\Desktop\\new.txt",inplace=True):
if not re.match(r"^#.*$",line):
#do the formatting
print "something", #print("something", end ="") for python 3
Done in a few lines
remember whatever you print that will go in the file.So you have to read and print every line and modify whichever you want to replace.Also use print "asd",
the ,
at the end is important as It will prevent print
from putting a newline there.
Now you dont watch lines starting with
#`.
So add the condition.
if not re.match(r"^#.*$",line):
#do the formatting and print
Upvotes: 2