Reputation: 4836
I'm not even sure if this is possible with data.table. I have a data set that looks like the following. Its a data frame, but I later convert to a data.table, called x
id xcord ycord
a 2 3
a 3 4
a 3 3
a 9 10
a 8 9
b 1 3
b 1 2
b 8 19
b 7 21
I want to identify two clusters per id, and that is proving to be difficult. I tried the following
x = x[,list(x1 = kmeans(xcord,centers=2)$centers, y1 = kmeans(ycord,centers=2)$centers,by = id]
but it gave the following error message.
All items in j=list(...) should be atomic vectors or lists. If you are trying something like j=list(.SD,newcol=mean(colA)) then use := by group instead (much quicker), or cbind or merge afterwards.
Calls: [ -> [.data.table
Execution halted
I'm expecting a data table with entries that can be "treated" as a list of centers. Is this even possible?
Upvotes: 1
Views: 2315
Reputation: 115382
the centers
element is a matrix (it will contain as many columns as columns in the x
argument to kmeans
.
If you want to find the clusters considering xcord
and ycord
in the same clustering episode you will need to pass a matrix to kmeans
. You will then have to coerce back to data.table afterwards. this will keep the names sensibly.
# eg.
fx <- x[,data.table(kmeans(cbind(xcord,ycord),centers=2)$centers),by=id]
fx
# id xcord ycord
# 1: a 2.666667 3.333333
# 2: a 8.500000 9.500000
# 3: b 7.500000 20.000000
# 4: b 1.000000 2.500000
Upvotes: 4