Reputation: 372
I'm trying to use NA
as a result to indicate that the
computed value for a given DataFrame "row" is meaningless
(or perhaps can't be computed). How do I get a column with computed NA
s that still responds to dropna
?
Example:
using DataFrames
df = DataFrame(A = 1:4, B = [1, 0, 2, 3], C = [5, 4, 3, 3])
# A value of 0 in column B should yield a foo of NA
function foo(d)
if d[:B] == 0
return NA
end
return d[:B] ./ d[:C] # vectorized to work with `by`
end
# What I'm looking for is something equivalent to this list
# comprehension, but that returns a DataFrame or DataArray
# since normal Arrays don't respond to `dropna`
comprehension = [foo(frame) for frame in eachrow(df)]
Upvotes: 4
Views: 280
Reputation: 372
You can do this...
using DataFramesMeta
result = @with(df, map(foo, :B, :C))
#=> DataArray{Any,1}: [0.2, NA, 0.667, 1.0]
...if foo
can be re-written to reference individual values rather than an entire DataFrame
:
function foo(b, c)
if b == 0
return NA
end
return b / c
end
Similarly, if you want a new DataFrame containing the new column, use @transform
:
tdf = @transform(df, foo = map(foo, :B, :C))
#=>4x4 DataFrame
# | Row | A | B | C | foo |
# |-----|---|---|---|----------|
# | 1 | 1 | 1 | 5 | 0.2 |
# | 2 | 2 | 0 | 4 | NA |
# | 3 | 3 | 2 | 3 | 0.666667 |
# | 4 | 4 | 3 | 3 | 1.0 |
Upvotes: 0
Reputation: 372
One option is to extend Base.convert
and DataArrays.dropna
so that dropna
can handle normal Vector
s:
using DataFrames
function Base.convert{T}(::Type{DataArray}, v::Vector{T})
da = DataArray(T[],Bool[])
for val in v
push!(da, val)
end
return da
end
function DataArrays.dropna(v::Vector)
return dropna(convert(DataArray,v))
end
Now the example should work as expected:
df = DataFrame(A = 1:4, B = [1, 0, 2, 3], C = [5, 4, 3, 3])
# A value of 0 in column B should yield a foo of NA
function foo(d)
if d[:B] == 0
return NA
end
return d[:B] / d[:C]
end
comprehension = [foo(frame) for frame in eachrow(df)]
dropna(comprehension) #=> Array{Any,1}: [0.2, 0.667, 1.]
Even without the extended dropna
, the extended convert
allows the comprehension to be inserted into the DataFrame as a new DataArray column, preserving NA
s and their appropriate dropping behavior:
conv = convert(DataArray, comprehension)
insert!(df, size(df, 2) + 1, conv, :foo)
#=> 4x4 DataFrame
# | Row | A | B | C | foo |
# |-----|---|---|---|----------|
# | 1 | 1 | 1 | 5 | 0.2 |
# | 2 | 2 | 0 | 4 | NA |
# | 3 | 3 | 2 | 3 | 0.666667 |
# | 4 | 4 | 3 | 3 | 1.0 |
typeof(df[:foo]) #=> DataArray{Any,1} (constructor with 1 method)
dropna(df[:foo]) #=> Array{Any,1}: [0.2, 0.667, 1.]
Upvotes: 2
Reputation: 1550
This is a bit tricky since rows of dataframes are awkward objects. For example, I would think this would be entirely reasonable:
using DataFrames
df = DataFrame(A = 1:4, B = [1, 0, 2, 3], C = [5, 4, 3, 3])
# A value of 0 in column B should yield a foo of NA
function foo(d)
if d[:B] == 0
return NA
end
return d[:B] / d[:C] # vectorized to work with `by`
end
comp = DataArray(Float64,4)
map!(r->foo(r), eachrow(df))
but this results in
`map!` has no method matching map!(::Function, ::DFRowIterator{DataFrame})
However, if you just want to do a by
that doesn't always return a row then you can do something like this:
using DataFrames
df = DataFrame(A = 1:4, B = [1, 0, 2, 3], C = [5, 4, 3, 3])
# A value of 0 in column B returns an empty array
function foo(d)
if d[1,:B] == 0
return []
end
return d[1,:B] / d[1,:C] #Plan on only getting a single row in the by
end
by(df, [:A,:B,:C]) do d
foo(d)
end
which results in
3x4 DataFrame
| Row | A | B | C | x1 |
|-----|---|---|---|----------|
| 1 | 1 | 1 | 5 | 0.2 |
| 2 | 3 | 2 | 3 | 0.666667 |
| 3 | 4 | 3 | 3 | 1.0 |
Upvotes: 1