J. Blauvelt
J. Blauvelt

Reputation: 785

Julia DataFrame ERROR: BoundsError attempt to access attempt to access a data frame with X rows at index false

I'm trying to subset a DataFrame in Julia as follows:

df = DataFrame(a=[1,2,3], b=["x", "y", "z"])
df2 = df[df.a == 2, :]

I'd expect to get back just the second row, but instead I get an error:

ERROR: BoundsError: attempt to access "attempt to access a data frame with 3 rows at index false"

What does this error mean and how do I subset the DataFrame?

Upvotes: 1

Views: 1221

Answers (2)

Bogumił Kamiński
Bogumił Kamiński

Reputation: 69819

Just to mention other options note that you can use the filter function here:

julia> filter(row -> row.a == 2, df)
1×2 DataFrame
│ Row │ a     │ b      │
│     │ Int64 │ String │
├─────┼───────┼────────┤
│ 1   │ 2     │ y      │

or

julia> df[filter(==(2), df.a), :]
1×2 DataFrame
│ Row │ a     │ b      │
│     │ Int64 │ String │
├─────┼───────┼────────┤
│ 1   │ 2     │ y      │

Upvotes: 2

J. Blauvelt
J. Blauvelt

Reputation: 785

Fortunately, you only need to add one character: .. The . character enables broadcasting on any Julia function, even ones like ==. Therefore, your code would be as follows:

df = DataFrame(a=[1,2,3], b=["x", "y", "z"])
df2 = df[df.a .== 2, :]

Without the broadcast, the clause df.a == 2 returns false because it's literally comparing the Array [1,2,3], as a whole unit, to the scalar value of 2. An Array of shape (3,) will never be equal to a scalar value of 2, without broadcasting, because the sizes are different. Therefore, that clause just returns a single false.

The error you're getting tells you that you're trying to access the DataFrame at index false, which is not a valid index for a DataFrame with 3 rows. By broadcasting with ., you're now creating a Bool Array of shape (3,), which is a valid way to index a DataFrame with 3 rows.

For more on broadcasting, see the official Julia documentation here.

Upvotes: 0

Related Questions