Reputation: 117
Let's assume my dataframe's second column contains lists of integers:
df = pd.DataFrame({"col_1":[1,2,3,4,5],"col_2":[[1,2],np.nan,[3,5,9],[2],[8,5]],"col_3":np.nan})
Output:
col_1 col_2 col_3
0 1 [1, 2] NaN
1 2 NaN NaN
2 3 [3, 5, 9] NaN
3 4 [2] NaN
4 5 [8, 5] NaN
I'd like to insert a 1
in column 3 if the int in column 1 can be found in in the list of ints in column 2:
col_1 col_2 col_3
0 1 [1, 2] 1
1 2 NaN NaN
2 3 [3, 5, 9] 1
3 4 [2] NaN
4 5 [8, 5] 1
I was trying to solve it like that:
for i in range(0,len(df)):
if df["col_1"][i] in df["col_2"][i]:
df["col_3"][i]=1
This gave me TypeError: argument of type 'float' is not iterable
because of the NaN
in the column 2 and I couldn't work out a way to deal with it.
(I've tried to solve this using a diffent solution based on .isin
but this wouldn't work because AttributeError: 'list' object has no attribute 'isin'
.)
I then had the idea to replace all the NaN
in col_2 with a 0
so that my inital for loop would be able to run through. There are no 0
in col_1 and will never be, so I'd be fine with that solution because this won't lead to wrong matches in col_3. To this end,
df.loc[df["col_2"].isnull(), "col_2"] = 0
is not enough because the if in
can't deal with ints: TypeError: argument of type 'int' is not iterable
. I would need the 0
do be inserted as an element of a list, but you can't just use =[0]
instead. I've tried different things based on .at
because it should be able to insert lists into cells, but I couldn't work it out.
Thank you in advance for any advice!
Upvotes: 2
Views: 328
Reputation: 863166
You can filter out NaNs with if-else
with x['col_2'] == x['col_2']
because NaN != NaN
is False
, alsofor convert True
s values to 1
is used map
by dictionary, False
values are not in dict, so returned NaN
s:
f = lambda x: x['col_1'] in x['col_2'] if x['col_2'] == x['col_2'] else np.nan
df['col_3'] = df.apply(f, 1).map({True:1})
print (df)
col_1 col_2 col_3
0 1 [1, 2] 1.0
1 2 NaN NaN
2 3 [3, 5, 9] 1.0
3 4 [2] NaN
4 5 [8, 5] 1.0
Or use DataFrame.dropna
for remove NaNs rows and after assign back new column are added removed NaN
s:
f = lambda x: x['col_1'] in x['col_2']
df['col_3'] = df.dropna(subset=['col_1', 'col_2']).apply(f, 1).map({True:1})
print (df)
col_1 col_2 col_3
0 1 [1, 2] 1.0
1 2 NaN NaN
2 3 [3, 5, 9] 1.0
3 4 [2] NaN
4 5 [8, 5] 1.0
Upvotes: 3
Reputation: 323326
Use
s=df.dropna(subset=['col_2','col_1'])
df['new']=pd.DataFrame(s.col_2.tolist(),index=s.index).isin(df.col_1).sum(1).loc[lambda x : x!=0]
df
col_1 col_2 col_3 new
0 1 [1, 2] NaN 1.0
1 2 NaN NaN NaN
2 3 [3, 5, 9] NaN 1.0
3 4 [2] NaN NaN
4 5 [8, 5] NaN 1.0
Upvotes: 1