may
may

Reputation: 1185

Selecting values with a threshold pandas dataframe

I would like to make a list of the columns which for each of those index = [0,43,46] the values are higher than a certain value, for example, 1000.

My data frame looks like this but with more columns.

index           asset.assetStateId.10  asset.assetStateId.100  asset.assetStateId.101  \
0.0                  1057.0                     0.0                     0.0   
43.0                  380.0                  1441.0                     0.0   
46.0                    0.0                     0.0                  1441.0 

So, in the end, I would like to have three lists which the name of the column which the values are higher than 1000 for each row.

Thanks in advance,

I have tried so far this:

lista_colunas = list(result_1.columns.values)
lista_assets = []
for asset in lista_colunas:
    if result_1.loc[result_1[asset]>1000]:
        lista_assets += [asset]

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Upvotes: 1

Views: 2017

Answers (2)

Cid0rz
Cid0rz

Reputation: 43

A pythonic way of filtering a list by the values would be list comprehension

[item for item in list if item >1000]

will keep only the items bigger than 1000 but won't keep the empty slots between them. Depending on how you want to organize the program you can (creating it from the data frame) do a structure like:

for name, values in solution:
    name = [item for item in values if item>1000]

This supports also dot lookup so you could do:

   [item.value for item in values if item.value>1000]

There are many ways to filter

Upvotes: 1

jezrael
jezrael

Reputation: 862481

I think you can create dictionary of lists:

First select rows by loc if necessary, then reshape by stack and reset_index:

idx  = [0,43,46] 
df = df.loc[idx].stack().reset_index(name='val')
print (df)    
   Index                 level_1     val
0      0   asset.assetStateId.10  1057.0
1      0  asset.assetStateId.100     0.0
2      0  asset.assetStateId.101     0.0
3     43   asset.assetStateId.10   380.0
4     43  asset.assetStateId.100  1441.0
5     43  asset.assetStateId.101     0.0
6     46   asset.assetStateId.10     0.0
7     46  asset.assetStateId.100     0.0
8     46  asset.assetStateId.101  1441.0

Then filter by boolean indexing, groupby and create lists, last convert to dict:

a = df[df['val'] > 1000].groupby('index')['level_1'].apply(list).to_dict()

Another solution is loop by each row and by condition create lists:

a = df.loc[idx].apply(lambda x: x.index[x > 1000].tolist(), axis=1).to_dict()

print (a)
{0.0: ['asset.assetStateId.10'], 
43.0: ['asset.assetStateId.100'], 
46.0: ['asset.assetStateId.101']}

print (a[0])
['asset.assetStateId.10']

print (a[43])
['asset.assetStateId.100']

Upvotes: 2

Related Questions