Reputation: 1599
I have a numpy array
of over 2 million int
s:
a = np.array([324, 986, 574 ... 986, 1232, 3943])
Each element in a
corresponds to an index value in a dataframe df
with shape (1324, 4):
index A B C D
0 'foo' 2 3 2
1 'bar' 2 4 8
...
1323 'foo' 2 5 8
I am trying to access the values of df.A
using a list comprehension:
l = [df.A.loc[i] for i in a]
but this is taking quite a long time to run. Is there a faster option; maybe I need to do a join? Thank you.
Upvotes: 0
Views: 1716
Reputation: 214937
If the value in a
corresponds to the value in the data frame's index, you should be able to just use .loc[a]
; if the values in a
refers to positions, then you need .iloc[a]
; if you need numpy arrays as result, as commented by @Scott, use df.A.loc[a].values
:
df.A.loc[a]
Example:
df = pd.DataFrame({
"A": ["a", "c", "b", "d"]
})
a = np.array([0,3,2,2,1,1,0])
df.A.loc[a]
#0 a
#3 d
#2 b
#2 b
#1 c
#1 c
#0 a
#Name: A, dtype: object
df.A.loc[a].values
# array(['a', 'd', 'b', 'b', 'c', 'c', 'a'], dtype=object)
Upvotes: 6
Reputation: 7038
This can be done via boolean indexing:
a = np.array([324, 986, 574, 986, 1232, 3943])
df
some_column
0 1
1 2
2 3
3 5
4 324
5 574
6 986
7 3943
df[df['some_column'].isin(a)]
some_column
4 324
5 574
6 986
7 3943
df[df['some_column'].isin(a)].values
array([[ 324],
[ 574],
[ 986],
[3943]], dtype=int64)
Similarly, if the array values correspond with the index:
df[df.index.isin(a)]
Upvotes: 1