Lewis Cooper
Lewis Cooper

Reputation: 73

Numpy where but keep original data that holds true

I have an array that I'm using np.where on. However, the output of the conditional statement just return the index of the true values.

Here's the statement I'm using:

data = np.where(data.OType.str.contains("YSOc"))

Input data

        Region     RAJ2000    DEJ2000                OType
0       LUP_III  242.588882 -38.644272                 Zero
1       LUP_III  242.588882 -39.302551                  two
2       LUP_III  242.588882 -39.377029                 star
3       LUP_III  242.595749 -38.762226                  one
4       LUP_III  242.602203 -39.317276                  two
        ...         ...        ...                  ...
582347  LUP_III  242.174133 -39.026955  YSOc_star+dust(MP1)
582348  LUP_III  242.178635 -39.104069  YSOc_star+dust(IR4)
582349  LUP_III  242.199524 -38.833614  YSOc_star+dust(IR4)
582350  LUP_III  242.205811 -39.094246  YSOc_star+dust(IR2)
582351  LUP_III  242.214279 -39.091789  YSOc_star+dust(IR2)

Output

(array([  4350,   5726,   6432,   9324,  13815,  14139,  18445,  29680,
        32350,  37842,  37956,  39458,  40384,  42086,  42241,  70026,
        87998,  95434,  95680, 100641, 140513, 144178, 158947, 161837,
       184541, 187837, 198310, 215526, 218879, 222999, 230776, 232013,
       233383, 235072, 251165, 259407, 267365, 268906, 269205, 286646,
       290633, 291358, 313746, 313779, 315215, 329447, 330986, 336783,
       336831, 339249, 341296, 348079, 351279, 351764, 353540, 356300,
       357192, 363877, 379226, 385232, 385635, 386531, 388968, 389570,
       397586, 400390, 402026, 436435, 438384, 439781, 443509, 447030,
       447881, 459637, 459906, 460051, 460722, 461185, 461459, 461556,
       461655, 461993, 465299, 465743, 466993, 467071, 468263, 469951,
       470610, 471196, 472743, 475490, 475665, 476385, 478243, 478549,
       478599, 478998, 484449, 485657, 486718, 486820, 486851, 487030,
       487446, 489547, 501403, 502071, 506799, 507159, 510826, 511213,
       512757, 513549, 514043, 514117, 514189, 514353, 514611, 514672,
       518171, 518276, 519617, 522213, 532190, 538127, 542022, 542202,
       542283, 542368, 547522, 547810, 548793, 552908, 554167, 557280,
       559775, 561043, 561541, 562073, 562375, 562401, 562634, 562699,
       562928, 562958, 564007, 564567, 567201, 568651, 570026, 573017,
       579175, 580137, 580332, 580402, 580473, 581081, 582273, 582274,
       582275, 582276, 582277, 582278, 582279, 582280, 582281, 582282,
       582283, 582284, 582285, 582286, 582287, 582288, 582289, 582290,
       582291, 582292, 582293, 582294, 582295, 582296, 582297, 582298,
       582299, 582300, 582301, 582302, 582303, 582304, 582305, 582306,
       582307, 582308, 582309, 582310, 582311, 582312, 582313, 582314,
       582315, 582316, 582317, 582318, 582319, 582320, 582321, 582322,
       582323, 582324, 582325, 582326, 582327, 582328, 582329, 582330,
       582331, 582332, 582333, 582334, 582335, 582336, 582337, 582338,
       582339, 582340, 582341, 582342, 582343, 582344, 582345, 582346,
       582347, 582348, 582349, 582350, 582351]),)

How do I make the output return the values from the original array that is returned from the statement rather than their indexes.

Upvotes: 0

Views: 828

Answers (2)

ddejohn
ddejohn

Reputation: 8960

What you're attempting to do is called boolean indexing.

It looks like data is actually a Pandas DataFrame. If so, you don't even need np.where:

data[data.OType.str.contains("YSOc")]

If you take a look at the documentation for np.where, it points out that

When only condition is provided, this function is a shorthand for np.asarray(condition).nonzero()

Looking at that documentation shows that the return type is a tuple of arrays. Notice the ,) at the end of your output? The reason you got an invalid key error from the other answer was because you were actually trying to index with a tuple of arrays, instead of a plain array.

Upvotes: 2

Sandro Martens
Sandro Martens

Reputation: 115

Use data[np.where(data.OType.str.contains("YSOc"))]

This filters the original array by the indices given by the where function.

Upvotes: 1

Related Questions