Reputation: 1895
I have a dataframe, want to create third column say col3 based on the condition if col2 value is present in col1 then 'Yes' else 'No'
data = [[[('330420', 0.9322496056556702), ('76546', 0.9322003126144409)],76546],[[('330420', 0.9322496056556702), ('500826', 0.9322003126144409)],876546]]
test = pd.DataFrame(data, columns=['col1','col2'])
col1 col2
0 [(330420, 0.9322496056556702), (76546, 0.93220... 76546
1 [(330420, 0.9322496056556702), (500826, 0.9322... 876546
Desired result:
data = [[[('330420', 0.9322496056556702), ('76546', 0.9322003126
144409)],76546, 'Yes'],[[('330420', 0.9322496056556702), ('500826', 0.9322003126144409)],876546,'No']]
test = pd.DataFrame(data, columns=['col1','col2', 'col3'])
col1 col2 col3
0 [(330420, 0.9322496056556702), (76546, 0.93220... 76546 Yes
1 [(330420, 0.9322496056556702), (500826, 0.9322... 876546 No
My Solution:
test['col3'] = [entry for tag in test['col2'] for entry in test['col1'] if tag in entry]
Getting error: ValueError: Length of values does not match length of index
Upvotes: 4
Views: 1791
Reputation: 2218
Using numpy where
:
test['col3'] = test.apply(lambda x: np.where(str(x.col2) in [i[0] for i in x.col1],"yes", "no"), axis =1)
test['col3']
0 yes
1 no
Upvotes: 0
Reputation: 12406
You can do this using .apply()
def sublist_checker(row):
check_both = ['Yes' if str(row['col2']) in sublist else 'No' for sublist in row['col1']]
check_any = 'Yes' if 'Yes' in check_both else 'No'
return check_any
test['col3'] = test.apply(sublist_checker, axis=1)
print(test)
col1 col2 col3
0 [(330420, 0.932249605656), (76546, 0.932200312614)] 76546 Yes
1 [(330420, 0.932249605656), (500826, 0.932200312614)] 876546 No
The function sublist_checker
performs a row-wise check of each element in test['col2']
against each sub-list found in test['col1']
and returns Yes
or No
based on presence or absence of that element in any of the sub-lists.
Upvotes: 0
Reputation: 164783
You should avoid lists in series. Let's try a vectorised solution:
# extract array of values and reshape
arr = np.array(df.pop('col1').values.tolist()).reshape(-1, 4)
# join to dataframe and replace list of tuples
df = df.join(pd.DataFrame(arr, dtype=float))
# apply test via isin
df['test'] = df.drop('col2', 1).isin(df['col2']).any(1)
print(df)
col2 0 1 2 3 test
0 76546 330420.0 0.93225 76546.0 0.9322 True
1 876546 330420.0 0.93225 500826.0 0.9322 False
Upvotes: 1
Reputation: 323356
Using any
with zip
[any([int(z[0])==y for z in x]) for x, y in zip (test.col1,test.col2)]
Out[227]: [True, False]
Upvotes: 4