Using a while loop to compare and modify a Series

Question

For each row of my dataframe, I would need to:

get the last word from a coma-separated list;
Check if this word is already the last word of an other list in the Series;
If not: loop through the list from its end to get the first one that matches this condition.

I took a Series containing lists of random characters as an example

In order to update the 'Last' Column, I was trying to use a function containing a while loop, but I can't figure out how to get it done, What are best practices to achieve this?

In[5]:
import pandas as pd
import numpy as np
df = pd.DataFrame({
   'List': ['6,f,e,w,m,i,n', '7,m,2,n,3,k,i', 'h,e,a,l,5,v,8', 'c,t,i,v,t,n,1', 'o,q,k,2,p', '6,b,p,n,7,1,k', '3,u,v,q,e,1,z,w', 'm,h,o,b,8,6,n'
 ]})

In[6]:
df

Out[6]:
    List
0   6,f,e,w,m,i,n
1   7,m,2,n,3,k,i
2   h,e,a,l,5,v,8
3   c,t,i,v,t,n,1
4   o,q,k,2,p
5   6,b,p,n,7,1,k
6   3,u,v,q,e,1,z,w
7   m,h,o,b,8,6,n

In[14]:
df['Last'] = df['List'].str.split(',').str[-1]
df['List-length'] = df['List'].str.split(",").apply(len)
df['frequency'] = df.groupby('Last')['Last'].transform('count'
df 

Out[14]:
    List             Last   List-length  frequency
0   6,f,e,w,m,i,n     n         7          2
1   7,m,2,n,3,k,i     i         7          1
2   h,e,a,l,5,v,8     8         7          1
3   c,t,i,v,t,n,1     1         7          1
4   o,q,k,2,p         p         5          1
5   6,b,p,n,7,1,k     k         7          1
6   3,u,v,q,e,1,z,w   w         8          1
7   m,h,o,b,8,6,n     n         7          2

In[1]:
def avoid_singles(d):
    index = -2
    remaining_items = d['List-length']
    number_of_singles = d.loc[d['frequency'] == 1].size
    while number_of_singles >= 1:
        d['Last'] = np.where((df['frequency'] == 1) & (d['List-length'] >= abs(index)), d['List'].str.split(",").str[index], d['Last'])
        df['frequency'] = df.groupby('Last')['Last'].transform('count')
        number_of_singles = d.loc[d['frequency'] == 1].size
        index += -1

avoid_singles(df)

And the expected Last column:

a_guest · Accepted Answer

You can use DataFrame.apply to go through the samples and then compute np.equal.outer for the characters with the last character of each other sample; np.argwhere let's you select the first character that matches this condition:

import numpy as np
import pandas as pd

df = pd.DataFrame({'List': ['6,f,e,w,m,i,n', '7,m,2,n,3,k,i', 'h,e,a,l,5,v,8', 'c,t,i,v,t,n,1', 'o,q,k,2,p', '6,b,p,n,7,1,k', '3,u,v,q,e,1,z,w', 'm,h,o,b,8,6,n']})

def get_char(row):
    l_reverse = row.l[::-1]
    mask = np.equal.outer(l_reverse, tmp.l.str[-1])
    mask[:, row.i] = False  # Do not match with same row.
    mask[-1, 0] = True  # Set any element in last row to True so we can fallback to the last character.
    return l_reverse[np.argwhere(mask)[0, 0]]  # Select the first matching character.

tmp = pd.DataFrame.from_dict(dict(
    l=df.List.str.split(','),
    i=np.arange(len(df))
))
df['Last'] = tmp.apply(get_char, axis=1)

Which outputs the following:

0    6,f,e,w,m,i,n    n
1    7,m,2,n,3,k,i    k
2    h,e,a,l,5,v,8    h
3    c,t,i,v,t,n,1    n
4        o,q,k,2,p    k
5    6,b,p,n,7,1,k    1
6  3,u,v,q,e,1,z,w    1
7    m,h,o,b,8,6,n    n

Note the samples 5, 6 output 1 and 1 respectively (as opposed to the example you provided) but this is the first character that matches the condition according to the rules you specified (k is not the last character in any other row but 1 is (sample 3)).

Using a while loop to compare and modify a Series

Answers (2)

Related Questions