Ulli Meraviglio
Ulli Meraviglio

Reputation: 13

Hello I have a code that prints what I need in python but i'd like it to write that result to a new file

The file look like a series of lines with IDs:

aaaa  
aass  
asdd  
adfg  
aaaa  

I'd like to get in a new file the ID and its occurrence in the old file as the form:

aaaa    2  
asdd    1  
aass    1  
adfg    1  

With the 2 element separated by tab.

The code i have print what i want but doesn't write in a new file:

with open("Only1ID.txt", "r") as file:
    file = [item.lower().replace("\n", "") for item in file.readlines()]
        for item in sorted(set(file)):
            print item.title(), file.count(item)

Upvotes: 1

Views: 45

Answers (1)

Byte Commander
Byte Commander

Reputation: 6776

As you use Python 2, the simplest approach to convert your console output to file output is by using the print chevron (>>) syntax which redirects the output to any file-like object:

with open("filename", "w") as f:  # open a file in write mode
    print >> f, "some data"       # print 'into the file'

Your code could look like this after simply adding another open to open the output file and adding the chevron to your print statement:

with open("Only1ID.txt", "r") as file, open("output.txt", "w") as out_file:
    file = [item.lower().replace("\n", "") for item in file.readlines()]
    for item in sorted(set(file)):
        print >> out_file item.title(), file.count(item)

However, your code has a few other more or less bad things which one should not do or could improve:

  • Do not use the same variable name file for both the file object returned by open and your processed list of strings. This is confusing, just use two different names.

  • You can directly iterate over the file object, which works like a generator that returns the file's lines as strings. Generators process requests for the next element just in time, that means it does not first load the whole file into your memory like file.readlines() and processes them afterwards, but only reads and stores one line at a time, whenever the next line is needed. That way you improve the code's performance and resource efficiency.

  • If you write a list comprehension, but you don't need its result necessarily as list because you simply want to iterate over it using a for loop, it's more efficient to use a generator expression (same effect as the file object's line generator described above). The only syntactical difference between a list comprehension and a generator expression are the brackets. Replace [...] with (...) and you have a generator. The only downside of a generator is that you neither can find out its length, nor can you access items directly using an index. As you don't need any of these features, the generator is fine here.

  • There is a simpler way to remove trailing newline characters from a line: line.rstrip() removes all trailing whitespaces. If you want to keep e.g. spaces, but only want the newline to be removed, pass that character as argument: line.rstrip("\n").

    However, it could possibly be even easier and faster to just not add another implicit line break during the print call instead of removing it first to have it re-added later. You would suppress the line break of print in Python 2 by simply adding a comma at the end of the statement:

    print >> out_file item.title(), file.count(item),  
    
  • There is a type Counter to count occurrences of elements in a collection, which is faster and easier than writing it yourself, because you don't need the additional count() call for every element. The Counter behaves mostly like a dictionary with your items as keys and their count as values. Simply import it from the collections module and use it like this:

    from collections import Counter
    c = Counter(lines)
    for item in c:
        print item, c[item]
    

With all those suggestions (except the one not to remove the line breaks) applied and the variables renamed to something more clear, the optimized code looks like this:

from collections import Counter
with open("Only1ID.txt") as in_file, open("output.txt", "w") as out_file:
    counter = Counter(line.lower().rstrip("\n") for line in in_file)
    for item in sorted(counter):
        print >> out_file item.title(), counter[item]

Upvotes: 1

Related Questions