Reputation: 785
I'm trying to subset a DataFrame in Julia as follows:
df = DataFrame(a=[1,2,3], b=["x", "y", "z"])
df2 = df[df.a == 2, :]
I'd expect to get back just the second row, but instead I get an error:
ERROR: BoundsError: attempt to access "attempt to access a data frame with 3 rows at index false"
What does this error mean and how do I subset the DataFrame?
Upvotes: 1
Views: 1221
Reputation: 69819
Just to mention other options note that you can use the filter
function here:
julia> filter(row -> row.a == 2, df)
1×2 DataFrame
│ Row │ a │ b │
│ │ Int64 │ String │
├─────┼───────┼────────┤
│ 1 │ 2 │ y │
or
julia> df[filter(==(2), df.a), :]
1×2 DataFrame
│ Row │ a │ b │
│ │ Int64 │ String │
├─────┼───────┼────────┤
│ 1 │ 2 │ y │
Upvotes: 2
Reputation: 785
Fortunately, you only need to add one character: .
. The .
character enables broadcasting on any Julia function, even ones like ==
. Therefore, your code would be as follows:
df = DataFrame(a=[1,2,3], b=["x", "y", "z"])
df2 = df[df.a .== 2, :]
Without the broadcast, the clause df.a == 2
returns false
because it's literally comparing the Array [1,2,3], as a whole unit, to the scalar value of 2. An Array of shape (3,) will never be equal to a scalar value of 2, without broadcasting, because the sizes are different. Therefore, that clause just returns a single false
.
The error you're getting tells you that you're trying to access the DataFrame at index false
, which is not a valid index for a DataFrame with 3 rows. By broadcasting with .
, you're now creating a Bool Array of shape (3,), which is a valid way to index a DataFrame with 3 rows.
For more on broadcasting, see the official Julia documentation here.
Upvotes: 0