Normality
Normality

Reputation: 33

Searching through a file in Python

Say that I have a file of restaurant names and that I need to search through said file and find a particular string like "Italian". How would the code look if I searched the file for the string and print out the number of restaurants with the same string?

f = open("/home/ubuntu/ipynb/NYU_Notes/2-Introduction_to_Python/data/restaurant-names.txt", "r")
content = f.read()
f.close()
lines = content.split("\n")
with open("/home/ubuntu/ipynb/NYU_Notes/2-Introduction_to_Python/data/restaurant-names.txt") as f:
      print ("There are", len(f.readlines()), "restaurants in the dataset")
with open("/home/ubuntu/ipynb/NYU_Notes/2-Introduction_to_Python/data/restaurant-names.txt") as f:
        searchlines = f.readlines()
    for i, line in enumerate(searchlines):
    if "GREEK" in line: 
        for l in searchlines[i:i+3]: print (l),
        print

Upvotes: 0

Views: 127

Answers (2)

Padraic Cunningham
Padraic Cunningham

Reputation: 180391

You could count all the words using a Counter dict and then do lookups for certain words:

from collections import Counter
from string import punctuation

f_name = "/home/ubuntu/ipynb/NYU_Notes/2-Introduction_to_Python/data/restaurant-names.txt"


with open(f_name) as f:
    #  sum(1 for _ in f) -> counts lines
    print ("There are", sum(1 for _ in f), "restaurants in the dataset")
    # reset file pointer back to the start
    f.seek(0)
    # get count of how many times each word appears, at most once per line
    cn = Counter(word.strip(punctuation).lower() for line in f for word in set(line.split()))
    print(cn["italian"]) # no keyError if missing, will be 0

we use set(line.split()) so if a word appeared twice for a certain restaurant, we would only count it once. That looks for exact matches, if you are also looking to match partials like foo in foobar then it is going to be more complex to create a dataset where you can efficiently lookup multiple words.

If you really just want to count one word all you need to do is use sum how many times the substring appears in a line:

f_name = "/home/ubuntu/ipynb/NYU_Notes/2-Introduction_to_Python/data/restaurant-names.txt"

with open(f_name) as f:
    print ("There are", sum(1 for _ in f), "restaurants in the dataset")
    f.seek(0)
    sub = "italian"
    count = sum(sub in line.lower() for line in f)

If you want exact matches, you would need the split logic again or to use a regex with word boundaries.

Upvotes: 2

Arindam
Arindam

Reputation: 303

You input the file as a string.
Then use the count method of strings.
Code:

#Let the file be taken as a string in s1
print s1.count("italian")

Upvotes: -1

Related Questions