Reputation: 49
I have a csv file with each row containing lists of adjectives.
For example, the first 2 rows are as follows:
["happy","sad","colorful"]
["horrible","sad","cheerful","happy"]
I want to extract all the data from this file to get a list containing each adjective only one. (Here, it would be a list as follows :
["happy","sad","colorful","horrible","cheerful"]
I am doing this using Python.
import csv
with open('adj.csv', 'rb') as f:
reader = csv.reader(f)
adj_list = list(reader)
filtered_list = []
for l in adj_list:
if l not in new_list:
filtered_list.append(l)
Upvotes: 1
Views: 164
Reputation: 44615
Assuming you are only interested in a list of unique words where order does not matter:
# Option A1
import csv
with open("adj.csv", "r") as f:
seen = set()
reader = csv.reader(f)
for line in reader:
for word in line:
seen.add(word)
list(seen)
# ['cheerful', 'colorful', 'horrible', 'happy', 'sad']
More concisely:
# Option A2
with open("adj.csv", "r") as f:
reader = csv.reader(f)
unique_words = {word for line in reader for word in line}
list(unique_words)
The with
statement safely opens and closes the file. We are simply adding every word to a set. We cast the filtered result to list()
and get a list of unique (unordered) words.
Alternatives
If ordered does matter, implement the unique_everseen
itertools recipe.
From itertools recipes:
def unique_everseen(iterable, key=None):
"List unique elements, preserving order. Remember all elements ever seen."
# unique_everseen('AAAABBBCCDAABBB') --> A B C D
# unique_everseen('ABBCcAD', str.lower) --> A B C D
seen = set()
seen_add = seen.add
if key is None:
for element in it.filterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element
You can manually implement this or install a third-library that implements it for you, such as more_itertools
, e.g. pip install more_itertools
# Option B
import csv
import more_itertools as mit
with open("adj.csv", "r") as f:
reader = csv.reader(f)
words = (word for line in reader for word in line)
unique_words = list(mit.unique_everseen(words))
unique_words
# ['happy', 'sad', 'colorful', 'horrible', 'cheerful']
Upvotes: 0
Reputation:
Supposing that "memory is not important" and that one liner is what you are looking for:
from itertools import chain
from csv import reader
print(list(set(chain(*reader(open('file.csv'))))))
having 'file.csv' content like this:
happy, sad, colorful
horrible, sad, cheerful, happy
OUTPUT:
['horrible', ' colorful', ' sad', ' cheerful', ' happy', 'happy']
You can remove the list()
part if you don't mind receive a Python set instead of a list.
Upvotes: 1