Reputation: 2005
I am trying to add columns of having mismatched length(number of rows) to a dataframe, it throws an error of,
DimensionMismatch("length of new column Target which is 60000 must match the number of rows in data frame (47040000)")
My code snippet is,
df = DataFrame(:Feature => train_x, :Target => train_y)
#train_x has 47040000 rows
#train_y has 60000 rows
Please suggest a solution for this problem. Thank you in advance.
Upvotes: 6
Views: 1218
Reputation: 42264
Since a DataFrame
is actually a set of columns this is possible:
df = DataFrame(x=Int[],y=Int[])
append!(df.x,[1,2])
append!(df.y,[1,2,3])
However, since such data frame does not make sense, you will not be able to work with it via the standard DataFrames
API (it will be seen as a corrupt DataFrame
):
julia> df[1,:]
DataFrameRowError showing value of type DataFrameRow{DataFrame,DataFrames.Index}:
ERROR: AssertionError: Data frame is corrupt: length of column :y (3) does not match length of column 1 (2). The column vector has likely been resized unintentionally (either directly or because it is shared with another data frame).
Upvotes: 4
Reputation: 13800
Are you sure this is what you're trying to do? Normally one would expect that there are a many rows of features as there are rows of the target column, so this error might point to a conceptual issue in your code.
If you absolutely have to do this though, I see two options:
missing
or some value of your choice, so :Target => [train_y; [missing for _ in length(train_x) - length(train_y)]
. Here I'm padding at the end of the vector, which might or might not be appropriate in your caseleftjoin
of a dataframe with your train_x
column onto a dataframe with your train_y
column - for this you will need an ibex column in both DataFrames that describes how the rows of y match to x. If you just add a running index 1:length(train_*)
to both DataFrames the result will be the same as padding the end of train_y
with missing
Upvotes: 5