Reputation: 189
Getting value error when running the code below, I thought it would be due to the iloc code to split the data into x and y, but cant see what im doing wrong:
if st.checkbox('Select Multiple Columns'):
new_data = st.multiselect(
"Select the target columns. Please note, the target variable should be the last column selected",
df.columns)
df1 = df[new_data]
st.dataframe(df1)
# dividing data into X and Y varibles
x = df1.iloc[:, :-1]
y = df1.iloc[:-1]
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=seed)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
st.write('Prediction:', y_pred)
Error that I get is the following:
ValueError: Found input variables with inconsistent numbers of samples: [196, 195] Traceback:
Snippet of the dataset:
1/1/20 X 2020 206457
1/1/20 X 2021 70571
1/1/20 X 2022 46918
1/1/20 X 2023 36492
1/1/20 X 2024 0
1/1/20 X 2025 0
1/1/20 X 2020 286616
1/1/20 X 2021 134276
1/1/20 X 2022 87674
1/1/20 X 2023 240
1/1/20 X 2024 0
1/1/20 X 2025 0
Upvotes: 0
Views: 8691
Reputation: 23217
Check your codes of the 2 statements:
x = df1.iloc[:, :-1]
y = df1.iloc[:-1]
x and y are slicing on df1
differently. x on the entire rows while y with one row less. Hence, inconsistent numbers of samples: [196, 195] ==> 196 for x; 195 for y
Please note that the first parameter of iloc[]
is slicing on rows, while the second parameter on columns.
You have x slicing all rows and one column less (without the last column), while y is slicing with one parameter only and is slicing only on row (without the last row) and it takes all columns by without specifying column slicing on the second parameter.
Upvotes: 2