Reputation: 133
So I have two lists of data, looking like this (shortened):
[[1.0, 1403603100],
[0.0, 1403603400],
[2.0, 1403603700],
[0.0, 1403604000],
[None, 1403604300]]
[1.0, 1403603100],
[0.0, 1403603400],
[1.0, 1403603700],
[None, 1403604000],
[5.0, 1403604300]]
What I'm wanting to do is merge them, summing the first elements of each dataset, or making it 0.0 if either counter value is None. So the above example would become this:
[[2.0, 1403603100],
[0.0, 1403603400],
[3.0, 1403603700],
[0.0, 1403604000],
[0.0, 1403604300]]
This is what I've come up with so far, apologies if it's a bit cludgy.
def emit_datum(datapoints):
for datum in datapoints:
yield datum
def merge_data(data_set1, data_set2):
assert len(data_set1) == len(data_set2)
data_length = len(data_set1)
data_gen1 = emit_datum(data_set1)
data_gen2 = emit_datum(data_set2)
merged_data = []
for _ in range(data_length):
datum1 = data_gen1.next()
datum2 = data_gen2.next()
if datum1[0] is None or datum2[0] is None:
merged_data.append([0.0, datum1[1]])
continue
count = datum1[0] + datum2[0]
merged_data.append([count, datum1[1]])
return merged_data
I can only hope/assume that there's something cunning I can do with itertools or collections?
Upvotes: 3
Views: 277
Reputation: 180482
If you are making both values equal to 0.0 if either are None you just need a simple loop.
l1 = [1.0, 1403603100],
[0.0, 1403603400],
[2.0, 1403603700],
[0.0, 1403604000],
[None, 1403604300]]
l2 = [[1.0, 1403603100],
[0.0, 1403603400],
[1.0, 1403603700],
[None, 1403604000],
[5.0, 1403604300]]
final = []
assert len(l1)== len(l2)
for x, y in zip(l1, l2):
if x[0] is None or y[0] is None:
y[0] = 0.0
final.append(y)
else:
final.append([x[0] + y[0], x[-1]])
print final
[[2.0, 1403603100], [0.0, 1403603400], [3.0, 1403603700], [0.0, 1403604000], [0.0, 1403604300]]
In [51]: %timeit merge_data(l1,l2)
100000 loops, best of 3: 5.76 µs per loop
In [52]: %%timeit
....: final = []
....: assert len(l1)==len(l2)
....: for x, y in zip(l1, l2):
....: if x[0] is None or y[0] is None:
....: y[0] = 0.0
....: final.append(y)
....: else:
....: final.append([x[0] + y[0], x[-1]])
....:
100000 loops, best of 3: 2.64 µs per loop
Upvotes: 1
Reputation: 2476
You can make use of zip
, like so:
def merge(list1, list2):
returnlist = []
for x, y in zip(list1, list2):
if x[0] is None or y[0] is None:
returnlist.append([0.0, x[1]])
else:
returnlist.append([x[0] + y[0], x[1]])
return returnlist
zip
returns an iterator over tuples containing elements from each input list with the same index (i.e. (list1[0], list2[0])
, (list1[1], list2[1])
, etc.)
Upvotes: 0
Reputation: 35796
What about 'binning' the data based on the identifier, i.e. collecting all values corresponding to one identifier (e.g. 1403603400), and later sum it up. A dictionary is great for collecting all values corresponding to an identifier (key), and a defaultdict of type list makes this especially simple:
>>> data = [[1.0, 1403603100], [1.0, 1403603100],
... [0.0, 1403603400], [0.0, 1403603400],
... [2.0, 1403603700], [1.0, 1403603700],
... [0.0, 1403604000], [None, 1403604000],
... [None, 1403604300], [5.0, 1403604300]]
>>> from collections import defaultdict
>>> d = defaultdict(list)
>>> for value, identifier in data:
... d[identifier].append(value)
...
Now we have the data sorted and can conditionally sum it:
>>> for identifier, valuelist in d.iteritems():
... if not None in valuelist:
... print identifier, sum(valuelist)
... else:
... print identifier, 0.0
...
1403603400 0.0
1403603700 3.0
1403603100 2.0
1403604300 0.0
1403604000 0.0
The last part in short, to obtain the list that you wanted to:
>>> [[i, sum(v)] if None not in v else [i, .0] for i, v in d.iteritems()]
[[1403603400, 0.0], [1403603700, 3.0], [1403603100, 2.0], [1403604300, 0.0], [1403604000, 0.0]]
That approach requires the data sets to be mixed in the first place, as you had in the first version of your example input.
Upvotes: 1
Reputation: 6361
use numpy array and you don't need to do any looping.this make your code faster if you are dealing with larger datasets.
import numpy as np
In [68]: a = np.asarray(a)
In [69]: b = np.asarray(b)
In [71]: a_none_idx = np.equal(a,None)
In [72]: b_none_idx = np.equal(b,None)
In [73]: a[a_none_idx]=0
In [74]: b[b_none_idx]=0
In [76]: c = np.zeros(a.shape)
In [77]: c[:,0]= a[:,0] + b[:,0]
In [78]: c
Out[78]:
array([[ 2., 0.],
[ 0., 0.],
[ 3., 0.],
[ 0., 0.],
[ 5., 0.]])
In [79]: c[a_none_idx]=0
In [80]: c[b_none_idx]=0
In [81]: c[:,1] = a[:,1]
In [82]: c
Out[82]:
array([[ 2.00000000e+00, 1.40360310e+09],
[ 0.00000000e+00, 1.40360340e+09],
[ 3.00000000e+00, 1.40360370e+09],
[ 0.00000000e+00, 1.40360400e+09],
[ 0.00000000e+00, 1.40360430e+09]]
Upvotes: 0