usernumber
usernumber

Reputation: 2186

Remove row from astropy table

I would like to remove rows that contain infs from an astropy table. Something like the following

for line in mytable:
    if float('inf') in line:
        mytable.remove(line)

except that I don't know what to use for a remove function.

In the documentation, it says how to remove a column, but not how to remove a row.

Upvotes: 1

Views: 3773

Answers (2)

Iguananaut
Iguananaut

Reputation: 23306

This is a little bit faster than your answer, especially as the size of the table grows.

Here we make a mask of all rows that contain inf by or-ing together the per-column masks, than slice the full table just once:

>>> table = Table({'a': [1, 2, 3], 'b': [1.0, np.inf, 3.0], 'c': [np.inf, 2.0, 3.0]})
>>> mask = np.logical_or.reduce([c == np.inf for c in table.columns.values()])
>>> table = table[~mask]
>>> table
<Table length=1>
  a      b       c
int64 float64 float64
----- ------- -------
    3     3.0     3.0

What we're doing in both cases is not really "removing rows" per-se, because we're not modifying the original table. Rather, we're creating a new table as a copy of the original table, with some rows omitted. So doing it it your way is slower because for each column it has to make a new copy of the table, whereas creating the mask first and then indexing makes a copy only once no matter how many columns there are:

In [24]: %%timeit
    ...: table2 = table
    ...: for col in table.colnames:
    ...:     table2 = table2[table2[col] != float('inf')]
    ...:
327 µs ± 40.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [25]: %%timeit
    ...: mask = np.logical_or.reduce([c == np.inf for c in table.columns.values()])
    ...: table2 = table[~mask]
    ...:
    ...:
121 µs ± 7.84 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

I suspect the difference is even more dramatic for a larger number of columns and/or rows.

Depending on what your use case is, you might also consider creating a masked table with per-column masks. This allows you to avoid removing data from the table, while still performing arithmetic operations on it that ignore singular values:

>>> table = Table({'a': [1, 2, 3], 'b': [1.0, np.inf, 3.0], 'c': [np.inf, 2.0, 3.0]}, masked=True)
>>> for col in table.columns.values():
...     col.mask = (col == np.inf)
...
>>> table
<Table masked=True length=3>
  a      b       c
int64 float64 float64
----- ------- -------
    1     1.0      --
    2      --     2.0
    3     3.0     3.0
>>> table['b'].mean()
2.0

Upvotes: 2

usernumber
usernumber

Reputation: 2186

Doing the following seems to work

for col in mytable.colnames:
    mytable = mytable[mytable[col] != float('inf')]

Upvotes: 0

Related Questions