yoshiserry
yoshiserry

Reputation: 21355

slice pandas dataframe by column showing all except the column provided

I have a dataframe (df) which looks like this

category | amount | freq
green         10     1
blue          5      2
orange        7      3
purple        5      4

I want to select only the 'frequency' and 'amount' columns, and all rows except the purple one

I know I can use df.ix to select the columns like this

df.ix[['green','blue','orange'],['freq','amount']]

However, how do you get the unique values in the category column, and select the columns which are not purple?

df.set_index(['category'])

Update

See Roman Pekar's solution for filtering out rows that you don't want.

For multiple rows create a series or a list (i.e. account_group) and reference it like this.

names = sorted_data[sorted_data.account.isin(account_group)]

Done this way names is a dataframe.

However this is similar but incorrect syntax, this will return a series.

names = sorted_data['account'].isin(account_group)

Upvotes: 1

Views: 955

Answers (1)

roman
roman

Reputation: 117400

>>> df
  category  amount  freq
0    green      10     1
1     blue       5     2
2   orange       7     3
3   purple       5     4

>>> df[df['category'] != 'purple'][['amount','freq']]
   amount  freq
0      10     1
1       5     2
2       7     3

update not sure if I understood OP correctly, but he wants also do it by subtracting lists: the first list is all the rows in the dataframe, the second is purple, and the third would be list-one minus list-two which would be green, blue, orange. Hence another solution:

>>> l1
['green', 'blue', 'orange', 'purple']
>>> l2
['purple']
>>> l3 = [x for x in l1 if x not in l2]
>>> l3
['green', 'blue', 'orange']
>>> df[df['category'].isin(l3)][['amount','freq']]
   amount  freq
0      10     1
1       5     2
2       7     3

Upvotes: 2

Related Questions