carence
carence

Reputation: 87

More efficient solution? Dictionary as sparse vector

I have two dictionaries that I use as sparse vectors:

dict1 = {'a': 1, 'b': 4}
dict2 = {'a': 2, 'c': 2}

I wrote my own __add__ function to get this desired result:

dict1 = {'a': 3, 'b': 4, 'c': 2}

It is important that I know the strings 'a', 'b' and 'c' for each corresponding value. Just making sure that I add up the correct dimensions is not enough. I will also get many more, previously unknown strings with some values that I just add to my dictionary at the moment.

Now my question: Is there a more efficient data structure out there? I looked at Numpy's arrays and Scipy's sparse matrixes but as far as I understand they are not really of any help here or am I just not seeing the solution?

I could keep keys and values in separate arrays but I don't think I can just use any already existing function to get the desired result.

dict1_keys   = np.array([a, b])
dict1_values = np.array([1, 4])
dict2_keys   = np.array([a, c])
dict2_values = np.array([2, 2])

# is there anything that will efficiently produce the following?
dict1_keys   = np.array([a, b, c])
dict1_values = np.array([3, 4, 2])

Upvotes: 2

Views: 1278

Answers (2)

hobs
hobs

Reputation: 19259

@sirfz's Pandas approach could be a one-liner using pandas Series:

>>> pd.Series(dict1).add(pd.Series(dict2), fill_value=0)
a    3.0
b    4.0
c    2.0

Or if your API required dicts

>>> dict(pd.Series(dict1).add(pd.Series(dict2), fill_value=0))
{'a': 3.0, 'b': 4.0, 'c': 2.0}

Plus, this should handle mixed inputs of dicts or Seriess or even scipy sparse matrix rows and sklearn Vectorizer output (sparse vectors/mappings)

Upvotes: 1

sirfz
sirfz

Reputation: 4277

Perhaps pandas is what you're looking for:

d1 = pandas.DataFrame(numpy.array([1, 4]), index=['a', 'b'], dtype="int32")
d2 = pandas.DataFrame(numpy.array([2, 2]), index=['a', 'c'], dtype="int32")

d1.add(d2, fill_value=0)

result:

   0
a  3
b  4
c  2

Upvotes: 2

Related Questions