floralmural
floralmural

Reputation: 49

Extracting data from csv

I have a csv file with each row containing lists of adjectives.

For example, the first 2 rows are as follows:

["happy","sad","colorful"]
["horrible","sad","cheerful","happy"]

I want to extract all the data from this file to get a list containing each adjective only one. (Here, it would be a list as follows :

["happy","sad","colorful","horrible","cheerful"]

I am doing this using Python.

import csv
with open('adj.csv', 'rb') as f: 
    reader = csv.reader(f) 
    adj_list = list(reader) 
    filtered_list = [] 
    for l in adj_list: 
        if l not in new_list: 
            filtered_list.append(l)

Upvotes: 1

Views: 164

Answers (2)

pylang
pylang

Reputation: 44615

Assuming you are only interested in a list of unique words where order does not matter:

# Option A1
import csv


with open("adj.csv", "r") as f:
    seen = set()
    reader = csv.reader(f)
    for line in reader:
        for word in line:
            seen.add(word)
list(seen)
# ['cheerful', 'colorful', 'horrible', 'happy', 'sad']

More concisely:

# Option A2
with open("adj.csv", "r") as f:
    reader = csv.reader(f)
    unique_words = {word for line in reader for word in line}

list(unique_words)

The with statement safely opens and closes the file. We are simply adding every word to a set. We cast the filtered result to list() and get a list of unique (unordered) words.


Alternatives

If ordered does matter, implement the unique_everseen itertools recipe.

From itertools recipes:

def unique_everseen(iterable, key=None):
    "List unique elements, preserving order. Remember all elements ever seen."
    # unique_everseen('AAAABBBCCDAABBB') --> A B C D
    # unique_everseen('ABBCcAD', str.lower) --> A B C D
    seen = set()
    seen_add = seen.add
    if key is None:
        for element in it.filterfalse(seen.__contains__, iterable):
            seen_add(element)
            yield element
    else:
        for element in iterable:
            k = key(element)
            if k not in seen:
                seen_add(k)
                yield element

You can manually implement this or install a third-library that implements it for you, such as more_itertools, e.g. pip install more_itertools

# Option B
import csv

import more_itertools as mit


with open("adj.csv", "r") as f:
    reader = csv.reader(f)
    words = (word for line in reader for word in line)
    unique_words = list(mit.unique_everseen(words))

unique_words
# ['happy', 'sad', 'colorful', 'horrible', 'cheerful']

Upvotes: 0

user1785721
user1785721

Reputation:

Supposing that "memory is not important" and that one liner is what you are looking for:

from itertools import chain
from csv import reader

print(list(set(chain(*reader(open('file.csv'))))))

having 'file.csv' content like this:

happy, sad, colorful
horrible, sad, cheerful, happy

OUTPUT:

['horrible', ' colorful', ' sad', ' cheerful', ' happy', 'happy']

You can remove the list() part if you don't mind receive a Python set instead of a list.

Upvotes: 1

Related Questions