user11727742
user11727742

Reputation:

How to convert CSV data into a dictionary using itertools.groupby

I have a text file, job.txt, which is below

job,salary
Developer,29000
Developer,28000
Tester,27000
Tester,26000

My code is

with open(r'C:\Users\job.txt') as f:
    file_content = f.readlines()
data = {}
for i, line in enumerate(file_content):
    if i == 0:
        continue
    job, salary = line.split(",")
    job = job.strip()
    salary = int(salary.strip())
    if not job in data:
        data[job] = []
    data[job].append(salary)
print("data =", data)

My expected result is below

data = {'Developer': [29000, 28000], 'Tester': [27000, 26000]}

How can I convert my code to use itertools.groupby?

Upvotes: 2

Views: 148

Answers (3)

ggorlen
ggorlen

Reputation: 56895

You can only rely on groupby if your data is already chunked into categories.

from itertools import groupby

with open("job.txt") as f:
    rows = [x.split(",") for x in f.readlines()[1:]]

data = {
    k.strip(): [int(y[1]) for y in v]
    for k, v in groupby(rows, key=lambda x: x[0])
}

With that in mind, I think a defaultdict is more appropriate here. Ordering is automatically handled and it's just less clever. Additionally, there's no need to slurp the file into memory or sort it (if unordered). Use dict(data) at the end if you don't like the defaultdict subclass.

from collections import defaultdict

data = defaultdict(list)

with open("job.txt") as f:
    for i, line in enumerate(f):
        if i:
            job, salary = [x.strip() for x in line.split(",")]
            data[job].append(int(salary))

As mentioned in the accepted answer, do prefer a CSV module if your actual data is at all more complicated than your example. CSVs can be difficult to parse and there's no reason to reinvent the wheel.

Upvotes: 0

oppressionslayer
oppressionslayer

Reputation: 7204

Try this if pandas is an option:

from collections import defaultdict
import pandas as pd

d = pd.read_csv('job.txt').to_numpy().tolist() 
res = defaultdict(list)
for v, k in d: res[v].append(k)
d = dict(res)

d
# {'Developer': [29000, 28000], 'Tester': [27000, 26000]}

Upvotes: 0

Arunmozhi
Arunmozhi

Reputation: 964

Here is the code that will generate the dictionary you wanted.

from itertools import groupby

data = [
    ["Developer",29000],
    ["Developer",28000],
    ["Tester",27000],
    ["Tester",26000]
]

def keyfunc(e):
    return e[0]

unique_keys = {}
data = sorted(data, key=keyfunc)

for k, g in groupby(data, keyfunc):
    unique_keys[k] = [i[1] for i in g]


>>> print(unique_keys)
{'Developer': [29000, 28000], 'Tester': [27000, 26000]}

P.S: I would suggest using the csv module to read the file instead of doing it yourself.

Upvotes: 1

Related Questions