Reputation: 294
I would like to normalize all values in the dictionary data
and store them again in another dictionary with the same keys and for each key the values should be store in 1D array so I did the following:
>>> data = {1: [0.6065306597126334], 2: [0.6065306597126334, 0.6065306597126334, 0.1353352832366127], 3: [0.6065306597126334, 0.6065306597126334, 0.1353352832366127], 4: [0.6065306597126334, 0.6065306597126334]}
>>> norm = {k: [v / sum(vals) for v in vals] for k, vals in data.items()}
>>> norm
{1: [1], 2: [0.4498162176582741, 0.4498162176582741, 0.10036756468345168], 3: [0.4498162176582741, 0.4498162176582741, 0.10036756468345168], 4: [0.5, 0.5]}
Now suppose the dictionary data
contains only a zero value for one of it's keys like the value of the first key 1
:
>>> data = {1: [0.0], 2: [0.6065306597126334, 0.6065306597126334, 0.1353352832366127], 3: [0.6065306597126334, 0.6065306597126334, 0.1353352832366127], 4: [0.6065306597126334, 0.6065306597126334]}
then normalizing the values of this dictionary will result by [nan]
values because of the division by zero
>>> norm = {k: [v / sum(vals) for v in vals] for k, vals in data.items()}
__main__:1: RuntimeWarning: invalid value encountered in double_scalars
>>> norm
{1: [nan], 2: [0.4498162176582741, 0.4498162176582741, 0.10036756468345168], 3: [0.4498162176582741, 0.4498162176582741, 0.10036756468345168], 4: [0.5, 0.5]}
So I inserted an if statement
to overcome this issue but I can't store the values for each key as a ID array
the code
>>> norm = {}
>>> for k, vals in data.items():
... values = []
... if sum(vals) == 0:
... values.append(list(vals))
... else:
... for v in vals:
... values.append(list([v/sum(vals)]))
... norm[k]=values
...
>>> norm
{1: [[1.0]], 2: [[0.4498162176582741], [0.4498162176582741], [0.10036756468345168]], 3: [[0.4498162176582741], [0.4498162176582741], [0.10036756468345168]], 4: [[0.5], [0.5]]}
I would like to get the norm
dictionary as
norm = {1: [1.0], 2: [0.4498162176582741, 0.4498162176582741, 0.10036756468345168], 3: [0.4498162176582741, 0.4498162176582741, 0.10036756468345168], 4: [0.5, 0.5]}
Also, For the dictionary data
, while it contains a zero value for one if it's keys, is there a better solution to normalize it because I think that my solution is not efficient!
P.S: I tried at the end of the for loop norm[k]= np.array(values)
instead of norm[k]=values
but the result was not as required.
Upvotes: 2
Views: 5786
Reputation: 1
This should work as well:
norm = {k: [v / sum(vals) for v in vals] if sum(vals)!=0 else [1] for k, vals in data.items() }
Upvotes: 0
Reputation: 8180
Your dict/list comprehension fails when sum(vals) == 0
:
>>> data = {1: [0.0], 2: [0.6065306597126334, 0.6065306597126334, 0.1353352832366127], 3: [0.6065306597126334, 0.6065306597126334, 0.1353352832366127], 4: [0.6065306597126334, 0.6065306597126334]}
>>> {k: [v / sum(vals) for v in vals] for k, vals in data.items()}
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in <dictcomp>
File "<stdin>", line 1, in <listcomp>
ZeroDivisionError: float division by zero
You can introduce a ternary expression to handle the case:
>>> {k: [v / sum(vals) if sum(vals)!=0 else 1.0 for v in vals] for k, vals in data.items()}
{1: [1.0], 2: [0.4498162176582741, 0.4498162176582741, 0.10036756468345168], 3: [0.4498162176582741, 0.4498162176582741, 0.10036756468345168], 4: [0.5, 0.5]}
If you want to avoid to evalaute sum(vals)
multiple times:
>>> {k: [v / s if s!=0 else 1.0 for v in vals] for k,vals,s in ((k, vals, sum(vals)) for k, vals in data.items())}
{1: [1.0], 2: [0.4498162176582741, 0.4498162176582741, 0.10036756468345168], 3: [0.4498162176582741, 0.4498162176582741, 0.10036756468345168], 4: [0.5, 0.5]}
((k, vals, sum(vals)) for k, vals in data.items())
is a generator that returns k
, vals
and sum(vals)
for every item.
Upvotes: 0
Reputation: 1153
As mentioned in an answer, extend
can be used to solve your problem. If you do want to use append
, you could take the first element of your lists.
norm = {}
for k, vals in data.items():
values = []
if sum(vals) == 0:
values.append(vals[0])
else:
for v in vals:
values.append([v / sum(vals)][0])
norm[k] = values
See difference between append vs extend list methods in python for an example of append vs extend
As for the optimization. Completely removing the for loops won't be possible but you can shortify your solution, while still maintaining readability:
norm = {}
for k, vals in data.items():
if sum(vals) == 0:
norm[k] = vals
else:
norm[k] = [x / sum(vals) for x in vals]
Upvotes: 1
Reputation: 58
append
as mentioned above adds an element to a list, and this element can be a list, that's why you currently have a list within a list. Ideally, you should be using extend
which concatenates the first list with another list.
Upvotes: 1