Reputation: 1185
I would like to make a list of the columns which for each of those index = [0,43,46] the values are higher than a certain value, for example, 1000.
My data frame looks like this but with more columns.
index asset.assetStateId.10 asset.assetStateId.100 asset.assetStateId.101 \
0.0 1057.0 0.0 0.0
43.0 380.0 1441.0 0.0
46.0 0.0 0.0 1441.0
So, in the end, I would like to have three lists which the name of the column which the values are higher than 1000 for each row.
Thanks in advance,
I have tried so far this:
lista_colunas = list(result_1.columns.values)
lista_assets = []
for asset in lista_colunas:
if result_1.loc[result_1[asset]>1000]:
lista_assets += [asset]
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Upvotes: 1
Views: 2017
Reputation: 43
A pythonic way of filtering a list by the values would be list comprehension
[item for item in list if item >1000]
will keep only the items bigger than 1000 but won't keep the empty slots between them. Depending on how you want to organize the program you can (creating it from the data frame) do a structure like:
for name, values in solution:
name = [item for item in values if item>1000]
This supports also dot lookup so you could do:
[item.value for item in values if item.value>1000]
There are many ways to filter
Upvotes: 1
Reputation: 862481
I think you can create dictionary of list
s:
First select rows by loc
if necessary, then reshape by stack
and reset_index
:
idx = [0,43,46]
df = df.loc[idx].stack().reset_index(name='val')
print (df)
Index level_1 val
0 0 asset.assetStateId.10 1057.0
1 0 asset.assetStateId.100 0.0
2 0 asset.assetStateId.101 0.0
3 43 asset.assetStateId.10 380.0
4 43 asset.assetStateId.100 1441.0
5 43 asset.assetStateId.101 0.0
6 46 asset.assetStateId.10 0.0
7 46 asset.assetStateId.100 0.0
8 46 asset.assetStateId.101 1441.0
Then filter by boolean indexing
, groupby
and create list
s, last convert to dict
:
a = df[df['val'] > 1000].groupby('index')['level_1'].apply(list).to_dict()
Another solution is loop by each row and by condition create list
s:
a = df.loc[idx].apply(lambda x: x.index[x > 1000].tolist(), axis=1).to_dict()
print (a)
{0.0: ['asset.assetStateId.10'],
43.0: ['asset.assetStateId.100'],
46.0: ['asset.assetStateId.101']}
print (a[0])
['asset.assetStateId.10']
print (a[43])
['asset.assetStateId.100']
Upvotes: 2