Reputation: 1895

Create column in pandas dataframe based on condition

I have a dataframe, want to create third column say col3 based on the condition if col2 value is present in col1 then 'Yes' else 'No'

data = [[[('330420', 0.9322496056556702), ('76546', 0.9322003126144409)],76546],[[('330420', 0.9322496056556702), ('500826', 0.9322003126144409)],876546]]
test = pd.DataFrame(data, columns=['col1','col2'])

                                                col1    col2
0  [(330420, 0.9322496056556702), (76546, 0.93220...   76546
1  [(330420, 0.9322496056556702), (500826, 0.9322...  876546

Desired result:

data = [[[('330420', 0.9322496056556702), ('76546', 0.9322003126

    144409)],76546, 'Yes'],[[('330420', 0.9322496056556702), ('500826', 0.9322003126144409)],876546,'No']]
    test = pd.DataFrame(data, columns=['col1','col2', 'col3'])

                                                    col1    col2 col3
    0  [(330420, 0.9322496056556702), (76546, 0.93220...   76546  Yes
    1  [(330420, 0.9322496056556702), (500826, 0.9322...  876546   No

My Solution:

test['col3'] = [entry for tag in test['col2'] for entry in test['col1'] if tag in entry]

Getting error: ValueError: Length of values does not match length of index

Upvotes: 4

Answers (4)

Mankind_2000

Reputation: 2218

Using numpy where:

test['col3'] = test.apply(lambda x: np.where(str(x.col2) in [i[0] for i in x.col1],"yes", "no"), axis =1)
test['col3']
0    yes
1     no

Upvotes: 0

edesz

Reputation: 12406

You can do this using .apply()

def sublist_checker(row):
    check_both = ['Yes' if str(row['col2']) in sublist else 'No' for sublist in row['col1']]
    check_any = 'Yes' if 'Yes' in check_both else 'No'
    return check_any

test['col3'] = test.apply(sublist_checker, axis=1)
print(test)

                                                   col1    col2 col3
0   [(330420, 0.932249605656), (76546, 0.932200312614)]   76546  Yes
1  [(330420, 0.932249605656), (500826, 0.932200312614)]  876546   No

The function sublist_checker performs a row-wise check of each element in test['col2'] against each sub-list found in test['col1'] and returns Yes or No based on presence or absence of that element in any of the sub-lists.

Upvotes: 0

jpp

Reputation: 164783

You should avoid lists in series. Let's try a vectorised solution:

# extract array of values and reshape
arr = np.array(df.pop('col1').values.tolist()).reshape(-1, 4)

# join to dataframe and replace list of tuples
df = df.join(pd.DataFrame(arr, dtype=float))

# apply test via isin
df['test'] = df.drop('col2', 1).isin(df['col2']).any(1)

print(df)

     col2         0        1         2       3   test
0   76546  330420.0  0.93225   76546.0  0.9322   True
1  876546  330420.0  0.93225  500826.0  0.9322  False

Upvotes: 1

BENY

Reputation: 323356

Using any with zip

[any([int(z[0])==y for z in x]) for x, y in zip (test.col1,test.col2)]
Out[227]: [True, False]

Upvotes: 4

Create column in pandas dataframe based on condition

Answers (4)

Related Questions