Reputation: 359
I'm building a scraper/crawler for linux directories. in essence the program will take users input for a file type to scrape for (which is where my question comes in)
I'm storing acceptable file extension types in a dictionary w/ nested lists for example:
file_types = {'images': ['png', 'jpg', 'jpeg', 'gif', 'bmp'], 'text': ['txt', 'doc', 'pdf']}
To give the user which options they have to choose from I use this for loop:
for k, v in file_types.items():
print(k, v)
Which prints the dictionary in this format:
audio ['mp3', 'mpa', 'wpi', 'wav', 'wpi']
text ['txt', 'doc', 'pdf']
video ['mp4', 'avi', '3g2', '3gp', 'mkv', 'm4v', 'mov', 'mpg', 'wmv', 'flv']
images ['png', 'jpg', 'jpeg', 'gif', 'bmp']
Now if I do:
scrape_for = input("Please enter either the type of file, or the
extension you would like to scrape for: \n")
how can I validate the users input exists in my dictionary file_types
as either a key OR a value (I say key OR value so if the user inputs 'images' I can use the values of the key images)
Upvotes: 1
Views: 132
Reputation: 410
Using Python's cool list comprehension make a flat list of extensions
list_of_extensions = [ item \
for extensionList in file_types.values() \
for item in extensionList
]
Now use idiomatic Python construct item in list_var
, which evaluates to True if the item is present in that list, and the or
.
if scrape_for in file_types or scrape_for in list_of_extensions:
# do something
else:
print("Unsupported file type: " + scrape_for)
Note: using the in
operator on dict name is equivalent (in effect) to scrape_for in file_types.keys()
Upvotes: 0
Reputation: 25829
I'd first flatten the extensions list into a set so you don't have to loop through it later on and can do quick on-the-spot lookups:
file_types = {'images': ['png', 'jpg', 'jpeg', 'gif', 'bmp'], 'text': ['txt', 'doc', 'pdf']}
file_extensions = set(sum(file_types.values(), []))
scrape_for = input("Enter the type / extension to scrape: ").lower()
if scrape_for not in file_types and scrape_for not in file_extensions:
print("I don't support this type / extension!")
Upvotes: 3