Reputation: 23
I have this csv file
Cat, and, dog, bites
Yahoo, news, claims, a, cat, mated, with, a, dog, and, produced, viable, offspring
Cat, killer, likely, is, a, big, dog
Professional, free, advice, on, dog, training, puppy, training
Cat, and, kitten, training, and, behavior
Dog, &, Cat, provides, dog, training, in Eugene, Oregon
Dog, and, cat, is, a, slang, term, used, by, police, officers, for, a, male-female, relationship
Shop, for, your, show, dog, grooming, and, pet, supplies
I want to make all the words start with a small letter and create a list which will include all the unique items from the above csv file. Have you any idea? Thanks in advance! So far, I have managed to convert all the words with a small letter:
unique_row_items = set([field.strip().lower() for field in row])
But i can't manage the other one.
def unique():
rows = list(csv.reader(open('example_1.csv', 'r'), delimiter=','))
result = []
for r in rows:
key = r
if key not in result:
result.append(r)
return result
Which does not give the results I want
Upvotes: 0
Views: 14028
Reputation: 365657
If you can't figure out how to do everything at once, do it step by step.
So, let's write an explicit for
statement over the rows:
result = []
# use `with` so the file gets closed
with open('example_1.csv', 'r') as f:
# no need for `list` here
rows = csv.reader(f, delimiter=',')
for row in rows:
# no need for `set([...])`, just `set(...)`
unique_row_items = set(field.strip().lower() for field in row)
for item in unique_row_items:
if item not in result:
result.append(item)
But if you look at this, you're trying to use a list as a set; it'll be easier (and more efficient) if you just use a set as a set; then you don't need the if … in
check:
result = set()
with open('example_1.csv', 'r') as f:
# no need for `list` here
rows = csv.reader(f, delimiter=',')
for row in rows:
unique_row_items = set(field.strip().lower() for field in row)
for item in unique_row_items:
result.add(item)
And now, add
ing each element from one set to another is just union
ing the sets, so you can replace those last two lines with, e.g.:
result |= unique_row_items
And now, if you want to turn it all back into one big expression, you can:
with open('example_1.csv', 'r') as f:
result = set.union(*(set(field.strip().lower() for field in row)
for row in csv.reader(f, delimiter=',')))
Also, in Python 2.7+, you can just use a set comprehension, instead of calling set
on a list comprehension or generator expression:
with open('example_1.csv', 'r') as f:
result = set.union(*({field.strip().lower() for field in row}
for row in csv.reader(f, delimiter=',')))
In fact, you can even turn the whole thing into one big comprehension with a nested loop:
with open('example_1.csv', 'r') as f:
result = {field.strip().lower()
for row in csv.reader(f, delimiter=',')
for field in row}
Or, alternatively, you don't have to make it one big expression:
with open('example_1.csv', 'r') as f:
rows = csv.reader(f, delimiter=',')
rowsets = ({field.strip().lower() for field in row} for row in rows)
result = set.union(*rowsets)
Also, as Padraic Cunningham pointed out, one of the dialect options the csv
module offers is skipinitialspace
, which does just like it sounds like, so you don't need the strip
anymore. For example, using the big set comprehension:
with open('example_1.csv', 'r') as f:
result = {field.lower()
for row in csv.reader(f, delimiter=',', skipinitialspace=True)
for field in row}
Or, alternatively, it looks like your format is really using comma-space rather than comma as a delimiter, so:
with open('example_1.csv', 'r') as f:
result = {field.lower()
for row in csv.reader(f, delimiter=', ')
for field in row}
Upvotes: 7
Reputation: 22954
To store all the words in lowercase , you can use .lower()
method on strings and after creating a list of all the words in the list we create a set
which returns only the unique values.
with open("data_file.csv", "r") as data_file:
all_words = []
for line in data_file.readlines():
for word in line.split(","):
all_words.append(word.lower())
unique_words = set(all_words)
print unique_words
Upvotes: 2