A Tammour
A Tammour

Reputation: 3

process and export text file into csv file

I'm using this code to get a text file as an input, and turn it to csv file as an output. The csv file has two columns, one for the words, and the other for the count of the words.

from collections import Counter
file = open(r"/Users/abdullahtammour/Documents/txt/1984/1984.txt", "r", encoding="utf-8-sig")
wordcount={}
wordcount = Counter((file.read().split()))
for item in wordcount.items():
    print("{}\t{}".format(*item), file=open("/Users/abdullahtammour/Documents/txt/1984/1984.csv", "a"))
file.close()

I want to enhance the code and add two feature: 1st (and the most important) I want only the words in the output file, no numbers, no characters like (*&-//.,!?) and so on. 2nd to turn all the words in the output file to be lower case.

Any help will be appreciated.

Upvotes: 0

Views: 164

Answers (1)

Silas Coker
Silas Coker

Reputation: 500

You can use the string method isalpha() to check if there are only alphabetic characters in a word, and you can use lower() to convert it to lower case. I'm assuming you don't want apostrophes or other punctuation in your words either, but if that is OK then you could strip such characters out with replace, like this:

word.replace("'",'').isalpha()

It's also better to just open a file once than to open & close it a thousand times, which is what you do by opening it in the body of the loop. It is not only inefficient but could conceivably have weird results if buffering is involved.

I rewrote it with a 'with' clause which is roughly equal to opening the file at the beginning of the clause and closing it at the end.

Not as important, but you can use the 'sep' keyword in print() instead of manually inserting a tab, like this:

print(arg1, arg2, sep='\t')

Revising your code:

from collections import Counter
file = open(r"/Users/abdullahtammour/Documents/txt/1984/1984.txt", "r", encoding="utf-8-sig")
wordcount={}
wordcount = Counter((file.read().split()))
file.close()

with open("/Users/abdullahtammour/Documents/txt/1984/1984.csv", "w") as file:
    for word, count in wordcount.items():
        if word.isalpha():
            print(word.lower(), count, sep='\t', file=file)

Upvotes: 1

Related Questions