Amay
Amay

Reputation: 104

Slicing Dataframe with elements as lists

My dataframe has list as elements and I want to have more efficient way to check for some conditions.

My dataframe looks like this

col_a   col_b
0   100 [1, 2, 3]
1   200 [2, 1]
2   300 [3]

I want to get only those rows which have 1 in col_b.

I have tried the naive way temp_list=list()

for i in range(len(df1.index)):    
    if 1 in df1.iloc[i,1]:
        temp_list.append(df1.iloc[i,0])

This takes a lot of time for big dataframes like this. How could I make the search more efficient for dataframes like this?

Upvotes: 2

Views: 81

Answers (4)

prosti
prosti

Reputation: 46409

I experimented with this approach:

df['col_b'] = df.apply(lambda x: eval(x['col_b']), axis = 1)  
s=df['col_b']
d = pd.get_dummies(s.apply(pd.Series).stack()).sum(level=0)
df = pd.concat([df, d], axis=1); 
print(df)
print('...')
print(df[1.0])

That gave me the indices like this at the end (column with the name 1.0 as number):

   id  col_a      col_b  1.0  2.0  3.0
0   1    100  (1, 2, 3)    1    1    1
1   2    200     (1, 2)    1    1    0
2   3    300          3    0    0    1
...
0    1
1    1
2    0
Name: 1.0, dtype: uint8

To printout the result:

df.loc[df[1.0]==1, ['id', 'col_a', 'col_b']]

Upvotes: 0

yatu
yatu

Reputation: 88275

You can a list comprehension to check if 1 is present in a given list, and use the result to perform boolean indexing on the dataframe:

df.loc[[1 in i for i in df.col_B ],:]

    col_a      col_B
0    100  [1, 2, 3]
1    200     [2, 1]

Here's another approach using sets:

df[df.col_B.ne(df.col_B.map(set).sub({1}).map(list))]

   col_a      col_B
0    100  [1, 2, 3]
1    200     [2, 1]

Upvotes: 1

jezrael
jezrael

Reputation: 863266

Use boolean indexing with list comprehension and loc for seelct column col_a:

a = df1.loc[[1 in x for x in df1['col_b']], 'col_a'].tolist()
print (a)
[100, 200]

If need select first column:

a = df1.iloc[[1 in x for x in df1['col_b']], 0].tolist()
print (a)
[100, 200]

If need all rows:

df2 = df1[[1 in x for x in df1['col_b']]]
print (df2)
   col_a      col_b
0    100  [1, 2, 3]
1    200     [2, 1]

Another solution with sets and isdisjoint:

df2 = df1[~df1['col_b'].map(set({1}).isdisjoint)]
print (df2)
   col_a      col_b
0    100  [1, 2, 3]
1    200     [2, 1]

Upvotes: 1

VnC
VnC

Reputation: 2026

df[df.col_b.apply(lambda x: 1 in x)]

Results in:

col_a   col_b
0   100 [1, 2, 3]
1   200 [2, 1]

Upvotes: 1

Related Questions