Reputation: 87
I have two dictionaries that I use as sparse vectors:
dict1 = {'a': 1, 'b': 4}
dict2 = {'a': 2, 'c': 2}
I wrote my own __add__
function to get this desired result:
dict1 = {'a': 3, 'b': 4, 'c': 2}
It is important that I know the strings 'a', 'b' and 'c' for each corresponding value. Just making sure that I add up the correct dimensions is not enough. I will also get many more, previously unknown strings with some values that I just add to my dictionary at the moment.
Now my question: Is there a more efficient data structure out there? I looked at Numpy's arrays and Scipy's sparse matrixes but as far as I understand they are not really of any help here or am I just not seeing the solution?
I could keep keys and values in separate arrays but I don't think I can just use any already existing function to get the desired result.
dict1_keys = np.array([a, b])
dict1_values = np.array([1, 4])
dict2_keys = np.array([a, c])
dict2_values = np.array([2, 2])
# is there anything that will efficiently produce the following?
dict1_keys = np.array([a, b, c])
dict1_values = np.array([3, 4, 2])
Upvotes: 2
Views: 1278
Reputation: 19259
@sirfz's Pandas approach could be a one-liner using pandas Series
:
>>> pd.Series(dict1).add(pd.Series(dict2), fill_value=0)
a 3.0
b 4.0
c 2.0
Or if your API required dict
s
>>> dict(pd.Series(dict1).add(pd.Series(dict2), fill_value=0))
{'a': 3.0, 'b': 4.0, 'c': 2.0}
Plus, this should handle mixed inputs of dict
s or Series
s or even scipy
sparse matrix rows and sklearn
Vectorizer
output (sparse vectors/mappings)
Upvotes: 1