Reputation:
I have a text file, job.txt, which is below
job,salary
Developer,29000
Developer,28000
Tester,27000
Tester,26000
My code is
with open(r'C:\Users\job.txt') as f:
file_content = f.readlines()
data = {}
for i, line in enumerate(file_content):
if i == 0:
continue
job, salary = line.split(",")
job = job.strip()
salary = int(salary.strip())
if not job in data:
data[job] = []
data[job].append(salary)
print("data =", data)
My expected result is below
data = {'Developer': [29000, 28000], 'Tester': [27000, 26000]}
How can I convert my code to use itertools.groupby
?
Upvotes: 2
Views: 148
Reputation: 56895
You can only rely on groupby
if your data is already chunked into categories.
from itertools import groupby
with open("job.txt") as f:
rows = [x.split(",") for x in f.readlines()[1:]]
data = {
k.strip(): [int(y[1]) for y in v]
for k, v in groupby(rows, key=lambda x: x[0])
}
With that in mind, I think a defaultdict
is more appropriate here. Ordering is automatically handled and it's just less clever. Additionally, there's no need to slurp the file into memory or sort it (if unordered). Use dict(data)
at the end if you don't like the defaultdict
subclass.
from collections import defaultdict
data = defaultdict(list)
with open("job.txt") as f:
for i, line in enumerate(f):
if i:
job, salary = [x.strip() for x in line.split(",")]
data[job].append(int(salary))
As mentioned in the accepted answer, do prefer a CSV module if your actual data is at all more complicated than your example. CSVs can be difficult to parse and there's no reason to reinvent the wheel.
Upvotes: 0
Reputation: 7204
Try this if pandas is an option:
from collections import defaultdict
import pandas as pd
d = pd.read_csv('job.txt').to_numpy().tolist()
res = defaultdict(list)
for v, k in d: res[v].append(k)
d = dict(res)
d
# {'Developer': [29000, 28000], 'Tester': [27000, 26000]}
Upvotes: 0
Reputation: 964
Here is the code that will generate the dictionary you wanted.
from itertools import groupby
data = [
["Developer",29000],
["Developer",28000],
["Tester",27000],
["Tester",26000]
]
def keyfunc(e):
return e[0]
unique_keys = {}
data = sorted(data, key=keyfunc)
for k, g in groupby(data, keyfunc):
unique_keys[k] = [i[1] for i in g]
>>> print(unique_keys)
{'Developer': [29000, 28000], 'Tester': [27000, 26000]}
P.S: I would suggest using the csv module to read the file instead of doing it yourself.
Upvotes: 1