Matthew Crews
Matthew Crews

Reputation: 4305

Julia iterate over rows of dataframe

I am trying to iterate over the rows of a DataFrame in Julia to generate a new column for the data frame. I haven't come across a clear example of how to do this. In R this type of thing is vectorized but from my understanding not all of Julia's operations are vectorized so I need to loop over the rows. I know I can do this with indexing but I believe there must be a better way. I want to be able to reference the column values by name. Here is that I have:

test_df = DataFrame( A = [1,2,3,4,5], B = [2,3,4,5,6])
test_df["C"] = [ test_df[i,"A"] * test_df[i,"B"] for i in 1:size(test_df,1)]

Is this the Julia/DataFrames way of doing this? Is there a more Julia-eque way of doing this? Thanks for any feedback.

Upvotes: 8

Views: 10424

Answers (2)

Mateo
Mateo

Reputation: 1604

The better, and already vectorized wa, to do what you want in your example would be

test_df[!, "C"] = test_df["A"] .* test_df["B"] 

Now if for some reason you can't vectorize your operations and you really want to loop over rows (unlikely...) Then you can do as follows:

for row in eachrow( test_df )
     # do something with row which is of type DataFrameRow
end

If you need the row index do

for (i, row) in enumerate( eachrow( test_df ) ) 
     # do something with row and i 
end

Upvotes: 6

John Myles White
John Myles White

Reputation: 2929

You'd be better off doing test_df[i,"A"] .* test_df[i,"B"]. In general, Julia uses a dot prefix to indicate operations that are elementwise. All of these element-wise operations are vectorized.

You also don't want to use an Array comprehension since you probably want a DataArray as your output. There are no DataArray comprehensions for now since comprehensions are built into the Julia parser, which makes them hard to override in libraries like DataArrays.jl.

Upvotes: 5

Related Questions