Reputation: 4305
I am trying to iterate over the rows of a DataFrame in Julia to generate a new column for the data frame. I haven't come across a clear example of how to do this. In R this type of thing is vectorized but from my understanding not all of Julia's operations are vectorized so I need to loop over the rows. I know I can do this with indexing but I believe there must be a better way. I want to be able to reference the column values by name. Here is that I have:
test_df = DataFrame( A = [1,2,3,4,5], B = [2,3,4,5,6])
test_df["C"] = [ test_df[i,"A"] * test_df[i,"B"] for i in 1:size(test_df,1)]
Is this the Julia/DataFrames way of doing this? Is there a more Julia-eque way of doing this? Thanks for any feedback.
Upvotes: 8
Views: 10424
Reputation: 1604
The better, and already vectorized wa, to do what you want in your example would be
test_df[!, "C"] = test_df["A"] .* test_df["B"]
Now if for some reason you can't vectorize your operations and you really want to loop over rows (unlikely...) Then you can do as follows:
for row in eachrow( test_df )
# do something with row which is of type DataFrameRow
end
If you need the row index do
for (i, row) in enumerate( eachrow( test_df ) )
# do something with row and i
end
Upvotes: 6
Reputation: 2929
You'd be better off doing test_df[i,"A"] .* test_df[i,"B"]
. In general, Julia uses a dot prefix to indicate operations that are elementwise. All of these element-wise operations are vectorized.
You also don't want to use an Array comprehension since you probably want a DataArray as your output. There are no DataArray comprehensions for now since comprehensions are built into the Julia parser, which makes them hard to override in libraries like DataArrays.jl.
Upvotes: 5