sedeh
sedeh

Reputation: 7313

Python: merge lists or data frames and overwrite missing values

Assuming I have the following lists:

list1 = ['MI', '', 'NY', '', 'AR', '']
list2 = ['', 'MS', '', 'OH', '', '']

Anywhere there is a missing value or an empty string in list1, I want to overwrite the empty string with a corresponding value in list2. Is there an efficient way to do this without having to iterate through each item in list1? Below is my current solution:

list1 = ['MI', '', 'NY', '', 'AR', '']
list2 = ['', 'MS', '', 'OH', '', '']

counter = 0

for each in list1:
    counter = counter + 1
    if len(each) == 0:
        list1[counter-1] = list2[counter-1]
print(list1)
>>> ['MI', 'MS', 'NY', 'OH', 'AR', '']

I tried to convert my lists to pandas data frames and used pandas.DataFrame.update() but didn't get the result I was looking for. A similar problem is solved here but in R.

Upvotes: 4

Views: 1506

Answers (3)

roman
roman

Reputation: 117540

you can use pandas pandas.Series.where() method, but I suppose there're still be an iteration:

>>> s1 = pd.Series(list1)
>>> s2 = pd.Series(list2)
>>> s1.where(s1 != '', s2)
0    MI
1    MS
2    NY
3    OH
4    AR
5      

Concerning your original method, you don't have to have your own counter, btw, you can use enumerate() method:

>>> def isnull1(list1, list2):
...     res = []
...     for i, x in enumerate(list1):
...         if not x:
...             res.append(list2[i])
...         else:
...             res.append(x)
...     return res
... 
>>> isnull1(list1, list2)
['MI', 'MS', 'NY', 'OH', 'AR', '']

But even better solution would be to use zip() and map()

>>> map(lambda x: x[1] if not x[0] else x[0], zip(list1, list2))
['MI', 'MS', 'NY', 'OH', 'AR', '']

It's also better to use generators if you don't need list right away:

>>> def isnull2(list1, list2):
...     for i, x in enumerate(list1):
...         if not x:
...             yield list2[i]
...         else:
...             yield x
... 
>>> list(isnull2(list1, list2))
['MI', 'MS', 'NY', 'OH', 'AR', '']

or, use imap() and izip() from itertools:

>>> from itertools import izip, imap
>>> list(imap(lambda x: x[1] if not x[0] else x[0], izip(list1, list2)))
['MI', 'MS', 'NY', 'OH', 'AR', '']

Upvotes: 1

coyotevz
coyotevz

Reputation: 1

Maybe this can help:

def list_default(l1, l2):
    i1 = iter(l1)
    i2 = iter(l2)
    for i in i1:
        next_default = i2.next()
        if not i:
            yield next_default
        else:
            yield i

list1 = ['MI', '', 'NY', '', 'AR', '']
list2 = ['', 'MS', '', 'OH', '', '']

print(list(list_default(list1, list2)))
>>> ['MI', 'MS', 'NY', 'OH', 'AR', '']

You must iterate over sequence to find missing values, with the previous function you avoid to track index of lists.

Sorry for my English

Upvotes: 0

xbug
xbug

Reputation: 1472

There's a more 'Pythonic' way to do it (using list comprehensions), but you still get an iteration in the end:

[x or y for x, y in zip(list1, list2)]

Upvotes: 4

Related Questions