Reputation: 21355
I have a dataframe (df) which looks like this
category | amount | freq
green 10 1
blue 5 2
orange 7 3
purple 5 4
I want to select only the 'frequency' and 'amount' columns, and all rows except the purple one
I know I can use df.ix to select the columns like this
df.ix[['green','blue','orange'],['freq','amount']]
However, how do you get the unique values in the category column, and select the columns which are not purple?
df.set_index(['category'])
Update
See Roman Pekar's solution for filtering out rows that you don't want.
For multiple rows create a series or a list (i.e. account_group) and reference it like this.
names = sorted_data[sorted_data.account.isin(account_group)]
Done this way names is a dataframe.
However this is similar but incorrect syntax, this will return a series.
names = sorted_data['account'].isin(account_group)
Upvotes: 1
Views: 955
Reputation: 117400
>>> df
category amount freq
0 green 10 1
1 blue 5 2
2 orange 7 3
3 purple 5 4
>>> df[df['category'] != 'purple'][['amount','freq']]
amount freq
0 10 1
1 5 2
2 7 3
update not sure if I understood OP correctly, but he wants also do it by subtracting lists: the first list is all the rows in the dataframe, the second is purple, and the third would be list-one minus list-two which would be green, blue, orange
. Hence another solution:
>>> l1
['green', 'blue', 'orange', 'purple']
>>> l2
['purple']
>>> l3 = [x for x in l1 if x not in l2]
>>> l3
['green', 'blue', 'orange']
>>> df[df['category'].isin(l3)][['amount','freq']]
amount freq
0 10 1
1 5 2
2 7 3
Upvotes: 2