Reputation: 131
I have multiple sets of data, and in each set of data there is a region that is somewhat banana shaped and two regions that are dense blobs. I have been able to differentiate these regions from the rest of the data using a DBSCAN algorithm, but I'd like to use a supervised algorithm to have the program then know which cluster is the banana, and which two clusters are the dense blobs, and I'm not sure where to start.
As there are 3 categories (banana, blob, neither), would doing two separate logistic regressions be the best approach (evaluate if it is banana or not-banana and if it is blob or not-blob)? or is there a good way to incorporate all 3 categories into one neural network?
Here are three data sets. In each, the banana is red. In the 1st, the two blobs are green and blue, in the 2nd the blobs are cyan and green, and in the the 3rd the blobs are blue and green. I'd like the program to (now that is has differentiated the different regions, to then label the banana and blob regions so I don't have to hand pick them every time I run the code.
Upvotes: 2
Views: 3919
Reputation: 77454
I believe you are still unclear about what you want to achieve.
That of course makes it hard to give you a good answer.
Your data seems to be 3D. In 3D you could for example compute the alpha shape of a cluster, and check if it is convex. Because your "banana" probably is not convex, while your blobs are.
You could also measure e.g. whether the cluster center actually is inside your cluster. If it isn't, the cluster is not a blob. You can measure if the extends along the three axes are the same or not.
But in the end, you need some notion of "banana".
Upvotes: 0
Reputation: 66775
As you are using python
, one of the best options would be to start with some big library, offering many different approaches so you can choose which one suits you the best. One of such libraries is sklearn
http://scikit-learn.org/stable/ .
Getting back to the problem itself. What are the models you should try?
Of course there are many others, but I would recommend to start with these ones. All of them support multi-class classification, so you do not need to worry how to encode the problem with three classes, simply create data in the form of two matrices x
and y
where x
are input values and y
is a vector of corresponding classes (eg. numbers from 1
to 3
).
Visualization of different classifiers from the library:
So it remains a question how to represent shape of a cluster - we need a fixed length real valued vector, so what can features actually represent?
There is quite comprehensive list and detailed overview here (for three-dimensional objects): http://web.ist.utl.pt/alfredo.ferreira/publications/DecorAR-Surveyon3DShapedescriptors.pdf
There is also quite informative presentation: http://www.global-edge.titech.ac.jp/faculty/hamid/courses/shapeAnalysis/files/3.A.ShapeRepresentation.pdf
Describing some descriptors and how to make them scale/position/rotation invariant (if it is relevant here)
Upvotes: 4
Reputation: 1087
Could Neural networks help , the "pybrain" library might be the best for it.
You could set up the neural net as a feed forward network. set it so that there is an output for each class of object you expect the data to contain.
Edit :sorry if I have completely misinterpreted the question. I'm assuming you have preexisting data you can feed to train the networks to differentiate clusters.
If there are 3 categories you could have 3 outputs to the NN or perhaps a single NN for each one that simply outputs a true or false value.
Upvotes: 0