ShanZhengYang
ShanZhengYang

Reputation: 17631

How to construct a list comprehension with nested for loops and conditionals for pandas?

I'm having difficulty getting the following complex list comprehension to work as expected. It's a double nested for loop with conditionals.

Let me first explain what I'm doing:

import pandas as pd

dict1 = {'stringA':['ABCDBAABDCBD','BBXB'], 'stringB':['ABDCXXXBDDDD', 'AAAB'], 'num':[42, 13]}

df = pd.DataFrame(dict1)
print(df)
        stringA       stringB  num
0  ABCDBAABDCBD  ABDCXXXBDDDD   42
1          BBXB          AAAB   13

This DataFrame has two columns stringA and stringB with strings containing characters A, B, C, D, X. By definition, these two strings have the same length.

Based on these two columns, I create dictionaries such that stringA begins at index 0, and stringB begins at the index starting at num.

Here's the function I use:

def create_translation(x):
    x['translated_dictionary'] = {i: i +x['num'] for i, e in enumerate(x['stringA'])}
    return x

df2 = df.apply(create_translation, axis=1).groupby('stringA')['translated_dictionary']


df2.head()
0    {0: 42, 1: 43, 2: 44, 3: 45, 4: 46, 5: 47, 6: ...
1                         {0: 13, 1: 14, 2: 15, 3: 16}
Name: translated_dictionary, dtype: object

print(df2.head()[0])
{0: 42, 1: 43, 2: 44, 3: 45, 4: 46, 5: 47, 6: 48, 7: 49, 8: 50, 9: 51, 10: 52, 11: 53}

print(df2.head()[1])
{0: 13, 1: 14, 2: 15, 3: 16}

That's correct.

However, there are 'X' characters in these strings. That requires a special rule: If X is in stringA, don't create a key-value pair in the dictionary. If X is in stringB, then the value should not be i + x['num'] but -500.

I tried the following list comprehension:

def try1(x):
    for count, element in enumerate(x['stringB']):
        x['translated_dictionary'] = {i: -500 if element == 'X' else  i + x['num'] for i, e in enumerate(x['stringA']) if e != 'X'}
    return x

That gives the wrong answer.

df3 = df.apply(try1, axis=1).groupby('stringA')['translated_dictionary']

print(df3.head()[0]) ## this is wrong!
{0: 42, 1: 43, 2: 44, 3: 45, 4: 46, 5: 47, 6: 48, 7: 49, 8: 50, 9: 51, 10: 52, 11: 53}

print(df3.head()[1])   ## this is correct! There is no key for 2:15!
{0: 13, 1: 14, 3: 16}

There are no -500 values!

The correct answer is:

print(df3.head()[0])
{0: 42, 1: 43, 2: 44, 3: 45, 4:-500, 5:-500, 6:-500, 7: 49, 8: 50, 9: 51, 10: 52, 11: 53}

print(df3.head()[1])
{0: 13, 1: 14, 3: 16}

Upvotes: 1

Views: 169

Answers (2)

John Zwinck
John Zwinck

Reputation: 249133

Here's a simple way, without any comprehensions (because they aren't helping clarify the code):

def create_translation(x):
    out = {}
    num = x['num']
    for i, (a, b) in enumerate(zip(x['stringA'], x['stringB'])):
        if a == 'X':
            pass
        elif b == 'X':
            out[i] = -500
        else:
            out[i] = num
        num += 1
    x['translated_dictionary'] = out
    return x

Upvotes: 1

BENY
BENY

Reputation: 323226

Why not flatten your df , you can check with this post and recreate the dict

n=df.stringA.str.len()
newdf=pd.DataFrame({'num':df.num.repeat(n),'stringA':sum(list(map(list,df.stringA)),[]),'stringB':sum(list(map(list,df.stringB)),[])})


newdf=newdf.loc[newdf.stringA!='X'].copy()# remove stringA value X
newdf['value']=newdf.groupby('num').cumcount()+newdf.num # using groupby create the cumcount 
newdf.loc[newdf.stringB=='X','value']=-500# assign -500 when stringB is X
[dict(zip(x.groupby('num').cumcount(),x['value']))for _,x in newdf.groupby('num')] # create the dict for different num by group
Out[390]: 
[{0: 13, 1: 14, 2: 15},
 {0: 42,
  1: 43,
  2: 44,
  3: 45,
  4: -500,
  5: -500,
  6: -500,
  7: 49,
  8: 50,
  9: 51,
  10: 52,
  11: 53}]

Upvotes: 0

Related Questions