Reputation: 15002
I want to select rows in a particular order given in a list. For example
This dataframe
a=[['car',1],['bike',3],['jewel',2],['tv',5],['phone',6]]
df=pd.DataFrame(a,columns=['items','quantity'])
>>> df
items quantity
0 car 1
1 bike 3
2 jewel 2
3 tv 5
4 phone 6
I want to get the rows with this order ['tv','car','phone']
, that is, first row tv and then car and then phone. I tried this method but it doesn't maintain order
arr=['tv','car','phone']
df.loc[df['items'].isin(arr)]
items quantity
0 car 1
3 tv 5
4 phone 6
Upvotes: 19
Views: 12209
Reputation: 12098
Why not search for index, filter and re-order:
df['new_order'] = df['items'].apply(lambda x: arr.index(x) if x in arr else -1)
df_new = df[df['new_order']>=0].sort_values('new_order')
items quantity new_order
3 tv 5 0
0 car 1 1
4 phone 6 2
Upvotes: 0
Reputation: 221614
For all items to be chosen existing in input df
, here's one with searchsorted
and should be good on performance -
In [43]: sidx = df['items'].argsort()
In [44]: df.iloc[sidx[df['items'].searchsorted(['tv','car','phone'],sorter=sidx)]]
Out[44]:
items quantity
3 tv 5
0 car 1
4 phone 6
Upvotes: 4
Reputation: 150785
merge
to the rescue:
(pd.DataFrame({'items':['tv','car','phone']})
.merge(df, on='items')
)
Output:
items quantity
0 tv 5
1 car 1
2 phone 6
Upvotes: 5
Reputation: 3759
Here is another variety that uses .loc
.
# Move items to the index, select, then reset.
df.set_index("items").loc[arr].reset_index()
Or another that doesn't change the index.
df.loc[df.reset_index().set_index("items").loc[arr]["index"]]
Upvotes: 4
Reputation: 323316
IIUC Categorical
df=df.loc[df['items'].isin(arr)]
df.iloc[pd.Categorical(df['items'],categories=arr,ordered=True).argsort()]
Out[157]:
items quantity
3 tv 5
0 car 1
4 phone 6
Or reindex
:Notice only different is this will not save the pervious index and if the original index do matter , you should using Categorical
(Mentioned by Andy L, if you have duplicate in items ,reindex
will failed )
df.set_index('items').reindex(arr).reset_index()
Out[160]:
items quantity
0 tv 5
1 car 1
2 phone 6
Or loop via the arr
pd.concat([df[df['items']==x] for x in arr])
Out[171]:
items quantity
3 tv 5
0 car 1
4 phone 6
Upvotes: 7
Reputation: 25259
I would create a dictionary from arr
and map
it to items
and dropna
, sort_values
d = dict(zip(arr, range(len(arr))))
Out[684]: {'car': 1, 'phone': 2, 'tv': 0}
df.loc[df['items'].map(d).dropna().sort_values().index]
Out[693]:
items quantity
3 tv 5
0 car 1
4 phone 6
Upvotes: 4
Reputation: 402814
Here's a non-intrusive solution using Index.get_indexer
that doesn't involve setting the index:
df.iloc[pd.Index(df['items']).get_indexer(['tv','car','phone'])]
items quantity
3 tv 5
0 car 1
4 phone 6
Note that if this is going to become a frequent thing (by thing, I mean "indexing" with a list on a column), you're better off turning that column into an index. Bonus points if you sort it.
df2 = df.set_index('items')
df2.loc[['tv','car','phone']]
quantity
items
tv 5
car 1
phone 6
Upvotes: 19
Reputation: 71600
Why not:
>>> df.iloc[df.loc[df['items'].isin(arr), 'items'].apply(arr.index).sort_values().index]
items quantity
3 tv 5
0 car 1
4 phone 6
>>>
Upvotes: 0