Reputation: 283
I am struggling with clustering of categorical data in ML.NET.
var predictor = mlContext.Model.CreatePredictionEngine(model) line fails with exception "System.InvalidOperationException: 'Incompatible features column type: 'Vector' vs 'Vector''"
I`m quite new to ml, can someone assist?
Thanks!
class Program
{
static void Main(string[] args)
{
var mlContext = new MLContext();
var samples = new[]
{
new DataPoint {Education = "0-5yrs", ZipCode = "98005"},
new DataPoint {Education = "0-5yrs", ZipCode = "98052"},
new DataPoint {Education = "6-11yrs", ZipCode = "98005"},
new DataPoint {Education = "6-11yrs", ZipCode = "98052"},
new DataPoint {Education = "11-15yrs", ZipCode = "98005"}
};
IDataView data = mlContext.Data.LoadFromEnumerable(samples);
var multiColumnKeyPipeline =
mlContext.Transforms.Categorical.OneHotEncoding(
new[]
{
new InputOutputColumnPair("Education"),
new InputOutputColumnPair("ZipCode")
});
IDataView transformedData =
multiColumnKeyPipeline.Fit(data).Transform(data);
string featuresColumnName = "Features";
var pipeline = mlContext.Transforms
.Concatenate(featuresColumnName, "Education", "ZipCode")
.Append(mlContext.Clustering.Trainers.KMeans(featuresColumnName, numberOfClusters: 2));
var model = pipeline.Fit(transformedData);
var predictor = mlContext.Model.CreatePredictionEngine<TransformedData, ClusterPredictionItem>(model);
}
private class DataPoint
{
public string Education { get; set; }
public string ZipCode { get; set; }
}
private class TransformedData
{
public float Education { get; set; }
public float ZipCode { get; set; }
}
internal class ClusterPredictionItem
{
}
}
Upvotes: 0
Views: 950
Reputation: 1409
I suspect you see some issues because of the way you have divided up your pipeline and base your actual training on the IDataView coming out from the transformation without being part of the pipeline, if you merge both your onehotencoding and your trainer in one pipeline you can simplify your code:
IDataView data = mlContext.Data.LoadFromEnumerable(samples);
string featuresColumnName = "Features";
var pipeline = mlContext.Transforms.Categorical.OneHotEncoding(
new[]
{
new InputOutputColumnPair("Education"),
new InputOutputColumnPair("ZipCode")
}).Append(mlContext.Transforms.Concatenate("Features", "Education", "ZipCode"))
.Append(mlContext.Clustering.Trainers.KMeans(featuresColumnName, numberOfClusters: 2));
var model = pipeline.Fit(data);
var predictor = mlContext.Model.CreatePredictionEngine<DataPoint, ClusterPredictionItem>(model);
And it should work without the exception.
Upvotes: 1