NickBraunagel
NickBraunagel

Reputation: 1599

Accessing Pandas dataframe values when index values contained in separate numpy array

I have a numpy array of over 2 million ints:

a = np.array([324, 986, 574 ... 986, 1232, 3943])

Each element in a corresponds to an index value in a dataframe df with shape (1324, 4):

index A     B C D
0     'foo' 2 3 2
1     'bar' 2 4 8
...
1323  'foo' 2 5 8

I am trying to access the values of df.A using a list comprehension:

l = [df.A.loc[i] for i in a]

but this is taking quite a long time to run. Is there a faster option; maybe I need to do a join? Thank you.

Upvotes: 0

Views: 1716

Answers (2)

akuiper
akuiper

Reputation: 214937

If the value in a corresponds to the value in the data frame's index, you should be able to just use .loc[a]; if the values in a refers to positions, then you need .iloc[a]; if you need numpy arrays as result, as commented by @Scott, use df.A.loc[a].values:

df.A.loc[a]

Example:

df = pd.DataFrame({
        "A": ["a", "c", "b", "d"]
    })

a = np.array([0,3,2,2,1,1,0])

df.A.loc[a]
#0    a
#3    d
#2    b
#2    b
#1    c
#1    c
#0    a
#Name: A, dtype: object

df.A.loc[a].values
# array(['a', 'd', 'b', 'b', 'c', 'c', 'a'], dtype=object)

Upvotes: 6

Andrew L
Andrew L

Reputation: 7038

This can be done via boolean indexing:

a = np.array([324, 986, 574, 986, 1232, 3943])

df
   some_column
0            1
1            2
2            3
3            5
4          324
5          574
6          986
7         3943

df[df['some_column'].isin(a)]
   some_column
4          324
5          574
6          986
7         3943

df[df['some_column'].isin(a)].values
array([[ 324],
       [ 574],
       [ 986],
       [3943]], dtype=int64)

Similarly, if the array values correspond with the index:

df[df.index.isin(a)]

Upvotes: 1

Related Questions