Dale Addison
Dale Addison

Reputation: 59

Writing results of NLTK FreqDist to a .csv file as a row in Python

I'm attempting to write out the results of a frequency count of specific words in a text file based on a collection of words in a python list ( I haven't included it in the code listing as there are several hundred)

file_path = 'D:/TestHedges/Hedges_Test_11.csv'
corpus_root = test_path
wordlists = PlaintextCorpusReader(corpus_root, '.*')
print(wordlists.fileids())
CIK_List = []
freq_out = []

for filename in glob.glob(os.path.join(test_path, '*.txt')):

 CIK = filename[33:39]
 CIK = CIK.strip('_')
 # CIK = CIK.strip('_0') commented out to see if it deals with just removing _.  It does not 13/9/2020



 newstext = wordlists.words()
 fdist = nltk.FreqDist([w.lower() for w in newstext])

 CIK_List.append(CIK)
 with open(file_path, 'w', newline='') as csv_file:
  writer = csv.writer(csv_file)
  writer.writerow(["CIK"] + word_list)
  for val in CIK_List:
   writer.writerow([val])
  for m in word_list:
     print(CIK, [fdist[m]], end='')
     writer.writerows([fdist[m]])

My problem is with the writing of fdist[m] as a row into a .csv file. It is generating an error

_csv.Error: iterable expected, not int

How can I re-write this to place the frequency distribution into a row in a .csv file?

Thanks in advance

Upvotes: 1

Views: 410

Answers (1)

sophros
sophros

Reputation: 16700

You have two choices - either use writerow instead of writerows or create a list of values first and then pass it to writer.writerows instead of fdist[m]. Now, each of the row values in the list should be a tuple (or an interable). Therefore, for writerows to work you would have to encapsulate it again in a tuple:

 writer.writerows([(fdist[m],)])

Here, the comma denotes a 1-value tuple.

In order to write all of the values in one row instead of this code:

for m in word_list:
     print(CIK, [fdist[m]], end='')
     writer.writerows([fdist[m]])

You should use:

for m in word_list:
     print(CIK, [fdist[m]], end='')
writer.writerows(([fdist[m] for m in word_list],))

Please note a list comprehension.

On a different note, just by looking at your code, it seems to me that you could do the same without involving NLTK library just by using collections.Counter from standard library. It is the underlying container in FreqDist class.

Upvotes: 1

Related Questions