Vincent Claes
Vincent Claes

Reputation: 4768

Repeat elements of list between each other until we reach a certain length

I have a list that starts with numbers and has nan's near the end

[1,2,3, nan, nan, nan, nan]

I want to spread out the available numbers over the entire length of my list until all nan's are overwritten.

There are nan's in our list, we start with the first number: [1, 1, 2, 3, nan, nan, nan] is there still nan left in the list? yes: [1,1,2,2,3, nan, nan] ... [1,1,2,2,3,3,nan] ... if all nan's are overwritten the list will look like

[1,1,1,2,2,3,3]

what is the most pythonic way to achieve this?

edit:

the nan is a numpy.nan, this can be replaced by None

Upvotes: 1

Views: 3336

Answers (5)

hpaulj
hpaulj

Reputation: 231355

Iterative solution

Here's an iterative solution, trying to match the logic in the question:

x = [1,2,3,None,None,None,None,None]  # replace nan with None

def onestep(cur,y,base):
    # one step of the iteration
    if cur is not None:
        y.append(cur)
        base.append(cur)
    else:
        y.append(base[0])  # append is simplest, for now
        base = base[1:]+[base[0]]  # rotate
    return base

y=[]
base=[]
for cur in x:
    base = onestep(cur,y,base)
    # print(base,y)
print(y)

which produces:

[1, 2, 3, 1, 2, 3, 1, 2]

I tried to keep track of a base index, the element of the base that I was going insert next. But it proved to be simpler to just rotate the base, and always use the 1st element.

I started with a j index that I stepped through x, but then realized I didn't need it to reference x[j]. I could just use for cur in x:.

Iterative solution - 2nd version

Lets refine it to collect the values in the desired order. To do this I need to keep track on an insertion point. But that proved to be difficult. So I switched to the idea I used before (see below) - collect sublists.

def onestep1(cur,y,i):
    # initial and extend sublists
    if cur is not None:
        y.append([cur])
    else:
        if i>len(y)-1:
            i = 0
        y[i].append(y[i][0])
        i += 1
    return i

y=[]
i = 0
for cur in x:
    i = onestep1(cur,y,i)
print(y, i)

producing

([[1, 1, 1], [2, 2, 2], [3, 3]], 2)

which can be flattened with chain:

print(list(itertools.chain(*y)))

producing:

[1, 1, 1, 2, 2, 2, 3, 3]

In this case I still need to keep track of a i index, indicating which sublist is up for incrementing.

An alternative to chain is a double comprehension:

[i for j in y for i in j]

Global repetition

An alternative is to step back from the iteration details, and look at the big picture.

Start with a list; lets replace nan (a numpy float) with None, a pure Python value. But it could be almost any distinct 'fill' value:

In [188]: x=[1,2,3,None,None,None,None,None]

where does it start?

In [190]: idx = x.index(None)
In [191]: idx
Out[191]: 3

What are the base values that we want to repeat?

In [192]: base = x[:idx]
In [193]: base
Out[193]: [1, 2, 3]

We want, in effect, to repeat each value of base as many times as needed to fill a list the length of x. Viewed this way it isn't really a replacement problem. It's, how to expand one list of n items into a list of m items.

In this example we want to repeat the 3 values cnt times:

In [194]: cnt=[3,3,2]

numpy has a nice function for applying cnt, np.repeat:

In [195]: np.repeat(base, cnt)
Out[195]: array([1, 1, 1, 2, 2, 2, 3, 3])

Sticking with lists we can do:

In [196]: [[i]*j for i,j in zip(base,cnt)]
Out[196]: [[1, 1, 1], [2, 2, 2], [3, 3]]

But this a nested list. A nice, idiomatic way of flattening a nested list is with itertools.chain. It is quite 'pythonic' to make use of tools that the standard library provides. It's good practice to become familiar with itertools:

In [197]: list(itertools.chain(*[[i]*j for i,j in zip(base, cnt)]))
Out[197]: [1, 1, 1, 2, 2, 2, 3, 3]

Another way to apply cnt is with a double comprehension (replacing both the list * and chain.

[i for i,c in zip(base, cnt) for _ in range(c)]

I could calculate the cnt list with the same sort of logic as in onestep (or onestep1), except accumulate a count value rather than sublist.

Or as the other answers show, we can reason from the length of base and the total length, using the integer division and modulus to figure out how many counts should be n, and how many n-1.

Mathematically the problem is:

Integers, m, n, such that:
8 = (3-m)*n + m*(n-1)
8 = 3*n -m*n + m*n - m = 3*n - m
n = (8//3) + 1 (= 3); m = 3*n - 8 (= 1)
cnt = [n]*(m-1) + [n-1]*(m) (=[3]*2 + [2]*1 = [3,3,2])

Pynchia's expression does the same thing:

rep_el = (rep+1 if p < rem else rep for p,el in enumerate(subl))

I would not consider any of these more 'pythonic' than the others. They are different strategies, but can all be written in good Python style.

Upvotes: 3

B. M.
B. M.

Reputation: 18628

For efficiency, a numpy solution :

import numpy as np
a=np.array([1,2,3,np.nan,np.nan,np.nan,np.nan]) # an example
n=len(a)
p=n-np.isnan(a).sum() # count of valids assuming nan's at the end.
repeats=n//p
m=n-repeats*p # number of overfilled.
q=m*(repeats+1) #border
a[q:]=a[m:p].repeat(repeats) #the end
a[:q]=a[:m].repeat(repeats+1) #the beginning

result :

array([ 1.,  1.,  1.,  2.,  2.,  3.,  3.])

Upvotes: 2

Pynchia
Pynchia

Reputation: 11590

Here's another solution that counts how many times each list element must be repeated

import itertools as it
l = [1, 2, 3, 'nan', 'nan', 'nan', 'nan']

off = l.index('nan') # offset of the first 'nan' element
subl = l[:off] # to avoid slicing more than once

rem = len(l) % off # remainder
rep = len(l) // off # min repetitions

rep_el = (rep+1 if p < rem else rep for p,el in enumerate(subl))
newl = it.chain(*(it.repeat(el,r) for el,r in zip(subl, rep_el)))

print(list(newl))

which produces

[1, 1, 1, 2, 2, 3, 3]

rem is the remainder of the division of the total number of elements in the list and the number of elements that aren't 'NaN'

rep is the minimum number of repetitions all elements have

rep_el is a generator that contains the repetitions each list element must have. If the position of the element is less than the remainder it means its repetition should be increased by one.

newl is the newly formed list. It is built by concatenating the repetitions generated by repeating each element for the correct number of times as calculated above.

Upvotes: 4

Eugene Soldatov
Eugene Soldatov

Reputation: 10135

You can do it like this (if I understand your question right):

my_list = [1, 2, 3, None, None, None, None]
result = [x for x in my_list if x is not None]
result.extend([result[i%len(result)] for i in xrange(len(my_list)-len(result))])
result.sort()

If you have to save original order just replace last string to:

result.sort(key=lambda x: my_list.index(x))

Upvotes: 2

Akshat Prakash
Akshat Prakash

Reputation: 45

r = '123'
i = 0
j = 0
while 'nan' in list_a:
  if i = len(r):
    i = 0
  if(list_a[j] == 'nan'):
    list_a[j] = r[i]
    i += 1
  j += 1

I think this should solve it.

Upvotes: 1

Related Questions