Reputation: 104
My dataframe has list as elements and I want to have more efficient way to check for some conditions.
My dataframe looks like this
col_a col_b
0 100 [1, 2, 3]
1 200 [2, 1]
2 300 [3]
I want to get only those rows which have 1 in col_b.
I have tried the naive way temp_list=list()
for i in range(len(df1.index)):
if 1 in df1.iloc[i,1]:
temp_list.append(df1.iloc[i,0])
This takes a lot of time for big dataframes like this. How could I make the search more efficient for dataframes like this?
Upvotes: 2
Views: 81
Reputation: 46409
I experimented with this approach:
df['col_b'] = df.apply(lambda x: eval(x['col_b']), axis = 1)
s=df['col_b']
d = pd.get_dummies(s.apply(pd.Series).stack()).sum(level=0)
df = pd.concat([df, d], axis=1);
print(df)
print('...')
print(df[1.0])
That gave me the indices like this at the end (column with the name 1.0
as number):
id col_a col_b 1.0 2.0 3.0
0 1 100 (1, 2, 3) 1 1 1
1 2 200 (1, 2) 1 1 0
2 3 300 3 0 0 1
...
0 1
1 1
2 0
Name: 1.0, dtype: uint8
To printout the result:
df.loc[df[1.0]==1, ['id', 'col_a', 'col_b']]
Upvotes: 0
Reputation: 88275
You can a list comprehension to check if 1
is present in a given list, and use the result to perform boolean indexing
on the dataframe:
df.loc[[1 in i for i in df.col_B ],:]
col_a col_B
0 100 [1, 2, 3]
1 200 [2, 1]
Here's another approach using sets
:
df[df.col_B.ne(df.col_B.map(set).sub({1}).map(list))]
col_a col_B
0 100 [1, 2, 3]
1 200 [2, 1]
Upvotes: 1
Reputation: 863266
Use boolean indexing
with list comprehension and loc for seelct column col_a
:
a = df1.loc[[1 in x for x in df1['col_b']], 'col_a'].tolist()
print (a)
[100, 200]
If need select first column:
a = df1.iloc[[1 in x for x in df1['col_b']], 0].tolist()
print (a)
[100, 200]
If need all rows:
df2 = df1[[1 in x for x in df1['col_b']]]
print (df2)
col_a col_b
0 100 [1, 2, 3]
1 200 [2, 1]
Another solution with set
s and isdisjoint
:
df2 = df1[~df1['col_b'].map(set({1}).isdisjoint)]
print (df2)
col_a col_b
0 100 [1, 2, 3]
1 200 [2, 1]
Upvotes: 1
Reputation: 2026
df[df.col_b.apply(lambda x: 1 in x)]
Results in:
col_a col_b
0 100 [1, 2, 3]
1 200 [2, 1]
Upvotes: 1