Reputation: 73
I have an array that I'm using np.where on. However, the output of the conditional statement just return the index of the true values.
Here's the statement I'm using:
data = np.where(data.OType.str.contains("YSOc"))
Input data
Region RAJ2000 DEJ2000 OType
0 LUP_III 242.588882 -38.644272 Zero
1 LUP_III 242.588882 -39.302551 two
2 LUP_III 242.588882 -39.377029 star
3 LUP_III 242.595749 -38.762226 one
4 LUP_III 242.602203 -39.317276 two
... ... ... ...
582347 LUP_III 242.174133 -39.026955 YSOc_star+dust(MP1)
582348 LUP_III 242.178635 -39.104069 YSOc_star+dust(IR4)
582349 LUP_III 242.199524 -38.833614 YSOc_star+dust(IR4)
582350 LUP_III 242.205811 -39.094246 YSOc_star+dust(IR2)
582351 LUP_III 242.214279 -39.091789 YSOc_star+dust(IR2)
Output
(array([ 4350, 5726, 6432, 9324, 13815, 14139, 18445, 29680,
32350, 37842, 37956, 39458, 40384, 42086, 42241, 70026,
87998, 95434, 95680, 100641, 140513, 144178, 158947, 161837,
184541, 187837, 198310, 215526, 218879, 222999, 230776, 232013,
233383, 235072, 251165, 259407, 267365, 268906, 269205, 286646,
290633, 291358, 313746, 313779, 315215, 329447, 330986, 336783,
336831, 339249, 341296, 348079, 351279, 351764, 353540, 356300,
357192, 363877, 379226, 385232, 385635, 386531, 388968, 389570,
397586, 400390, 402026, 436435, 438384, 439781, 443509, 447030,
447881, 459637, 459906, 460051, 460722, 461185, 461459, 461556,
461655, 461993, 465299, 465743, 466993, 467071, 468263, 469951,
470610, 471196, 472743, 475490, 475665, 476385, 478243, 478549,
478599, 478998, 484449, 485657, 486718, 486820, 486851, 487030,
487446, 489547, 501403, 502071, 506799, 507159, 510826, 511213,
512757, 513549, 514043, 514117, 514189, 514353, 514611, 514672,
518171, 518276, 519617, 522213, 532190, 538127, 542022, 542202,
542283, 542368, 547522, 547810, 548793, 552908, 554167, 557280,
559775, 561043, 561541, 562073, 562375, 562401, 562634, 562699,
562928, 562958, 564007, 564567, 567201, 568651, 570026, 573017,
579175, 580137, 580332, 580402, 580473, 581081, 582273, 582274,
582275, 582276, 582277, 582278, 582279, 582280, 582281, 582282,
582283, 582284, 582285, 582286, 582287, 582288, 582289, 582290,
582291, 582292, 582293, 582294, 582295, 582296, 582297, 582298,
582299, 582300, 582301, 582302, 582303, 582304, 582305, 582306,
582307, 582308, 582309, 582310, 582311, 582312, 582313, 582314,
582315, 582316, 582317, 582318, 582319, 582320, 582321, 582322,
582323, 582324, 582325, 582326, 582327, 582328, 582329, 582330,
582331, 582332, 582333, 582334, 582335, 582336, 582337, 582338,
582339, 582340, 582341, 582342, 582343, 582344, 582345, 582346,
582347, 582348, 582349, 582350, 582351]),)
How do I make the output return the values from the original array that is returned from the statement rather than their indexes.
Upvotes: 0
Views: 828
Reputation: 8960
What you're attempting to do is called boolean indexing.
It looks like data
is actually a Pandas DataFrame. If so, you don't even need np.where
:
data[data.OType.str.contains("YSOc")]
If you take a look at the documentation for np.where
, it points out that
When only condition is provided, this function is a shorthand for
np.asarray(condition).nonzero()
Looking at that documentation shows that the return type is a tuple of arrays. Notice the ,)
at the end of your output? The reason you got an invalid key
error from the other answer was because you were actually trying to index with a tuple of arrays, instead of a plain array.
Upvotes: 2
Reputation: 115
Use data[np.where(data.OType.str.contains("YSOc"))]
This filters the original array by the indices given by the where
function.
Upvotes: 1