Converting list to dictionary, and tokenizing the key values - possible?

Question

So basically I have a folder of files I'm opening and reading into python.

I want to search these files and count the keywords in each file, to make a dataframe like the attached image.

I have managed to open and read these files into a list, but my problem is as follows:

Edit 1:

I decided to try and import the files as a dictionary instead. It works, but when I try to lower-case the values, I get a 'list' object attribute error - even though in my variable explorer, it's defined as a dictionary.

import os
filenames = os.listdir('.')
file_dict = {}
for file in filenames:
    with open(file) as f:
        items = [i.strip() for i in f.read().split(",")]
    file_dict[file.replace(".txt", "")] = items

def lower_dict(d):
   new_dict = dict((k, v.lower()) for k, v in d.items())
   return new_dict
print(lower_dict(file_dict))

output =

AttributeError: 'list' object has no attribute 'lower'

Pre-edit post:

1. Each list value doesn't retain the filename key. So I don't have the rows I need.

2. I can't conduct a search of keywords in the list anyway, because it is not tokenized. So I can't count the keywords per file.

Here's my code for opening the files, converting them to lowercase and storing them in a list.

How can I transform this into a dictionary retaining the filename, and tokenized key values?. Additionally, is it better to somehow import the file and contents into a dictionary directly? Can I still tokenize and lower-case everything?

import os
import nltk
# create list of filenames to loop over
filenames = os.listdir('.')
#create an empty list for storage 
Lcase_content = []
tokenized = []
num = 0
# read files from folder, convert to lower case 
for filename in filenames:  
    if filename.endswith(".txt"): 
        with open(os.path.join('.', filename)) as file: 
            content = file.read()   
            # convert to lower-case value 
            Lcase_content.append(content.lower())

       ## this two lines below don't work - index out of range error
            tokenized[num] = nltk.tokenize.word_tokenize(tokenized[num])
            num = num + 1

Converting list to dictionary, and tokenizing the key values - possible?

Answers (1)

Related Questions