6:[["$","$Le",null,{}],["$","div",null,{"className":"min-h-screen bg-gray-100 p-6","children":[["$","$Lf",null,{}],["$","script",null,{"type":"application/ld+json","dangerouslySetInnerHTML":{"__html":"{\"@context\":\"https://schema.org\",\"@type\":\"QAPage\",\"mainEntity\":{\"@type\":\"Question\",\"name\":\"Hadoop distributed version of K-Means?\",\"text\":\"

Wondering if there is an open source implementation for Hadoop distributed version of K-Means? Ask for Hadoop since data is big which cannot be hold into a single box.

\\n\\n

thanks in advance,\\nLin

\\n\",\"author\":{\"@type\":\"Person\",\"name\":\"Lin Ma\"},\"upvoteCount\":0,\"answerCount\":2,\"acceptedAnswer\":{\"@type\":\"Answer\",\"text\":\"

Yes, there is, Mahaout has several k-means implementation e.g.: mahout.apache.org/users/clustering/k-means-clustering.html

\\n\",\"author\":{\"@type\":\"Person\",\"name\":\"bpgergo\"},\"upvoteCount\":1}}}"}}],["$","div",null,{"className":"bg-white shadow-md rounded-lg p-6 mb-6 relative","children":[["$","div",null,{"className":"absolute top-4 right-4 flex flex-wrap space-x-2","children":[["$","span","hadoop",{"className":"bg-blue-600 text-white text-sm px-3 py-1 rounded-full","children":["$","$L10",null,{"href":"/discussion/tag/hadoop/1","children":"hadoop"}]}],["$","span","k-means",{"className":"bg-blue-600 text-white text-sm px-3 py-1 rounded-full","children":["$","$L10",null,{"href":"/discussion/tag/k-means/1","children":"k-means"}]}]]}],["$","div",null,{"className":"flex items-center mb-4","children":[["$","img",null,{"src":"https://www.gravatar.com/avatar/f4c516ea2a612cdca29a5b48e9805cd9?s=256&d=identicon&r=PG","alt":"Lin Ma","className":"w-16 h-16 rounded-full border"}],["$","div",null,{"className":"ml-4","children":[["$","a",null,{"href":"https://stackoverflow.com/users/1850923/lin-ma","target":"_blank","rel":"noopener noreferrer","className":"text-lg font-semibold text-blue-600 hover:underline","children":"Lin Ma"}],["$","p",null,{"className":"text-sm text-gray-500","children":["Reputation: ",10139]}]]}]]}],["$","h1",null,{"className":"text-2xl font-bold text-gray-800 mb-4","children":"Hadoop distributed version of K-Means?"}],["$","p",null,{"className":"text-gray-700 mt-4","dangerouslySetInnerHTML":{"__html":"

Wondering if there is an open source implementation for Hadoop distributed version of K-Means? Ask for Hadoop since data is big which cannot be hold into a single box.

\n\n

thanks in advance,\nLin

\n"}}],["$","div",null,{"className":"text-gray-600 text-sm mt-4","children":[["$","p",null,{"children":["Upvotes: ",0]}],["$","p",null,{"children":["Views: ",127]}]]}]]}],["$","div",null,{"className":"container mx-auto","children":[["$","h2",null,{"className":"text-2xl font-semibold text-gray-800 mb-6","children":["Answers (",2,")"]}],[["$","div","30501657",{"className":"bg-white shadow-md rounded-lg p-6 mb-6","children":[["$","div",null,{"className":"flex items-center mb-4","children":[["$","img",null,{"src":"https://www.gravatar.com/avatar/ab6039a55dd17254255c00c7ab93030b?s=256&d=identicon&r=PG&f=y&so-version=2","alt":"Azwaw","className":"w-12 h-12 rounded-full border"}],["$","div",null,{"className":"ml-4","children":[["$","a",null,{"href":"https://stackoverflow.com/users/4098628/azwaw","target":"_blank","rel":"noopener noreferrer","className":"text-lg font-semibold text-blue-600 hover:underline","children":"Azwaw"}],["$","p",null,{"className":"text-sm text-gray-500","children":["Reputation: ",558]}]]}]]}],["$","p",null,{"className":"text-gray-700 mb-4","dangerouslySetInnerHTML":{"__html":"

You can use spark for this. Spark implements KMeans. Spark uses RDD (Resilient Distributed Dataset). Your data are distributed on your cluster and each node process closest data.

\n\n

Spark's performances can be better than Mahout because some of interemediate process are not written on HDFS.

\n"}}],["$","div",null,{"className":"text-gray-600 text-sm","children":["$","p",null,{"children":["Upvotes: ",3]}]}]]}],["$","div","30501457",{"className":"bg-white shadow-md rounded-lg p-6 mb-6","children":[["$","div",null,{"className":"flex items-center mb-4","children":[["$","img",null,{"src":"https://www.gravatar.com/avatar/edfaee6d982f519a6efdad5b2b6c12d0?s=256&d=identicon&r=PG","alt":"bpgergo","className":"w-12 h-12 rounded-full border"}],["$","div",null,{"className":"ml-4","children":[["$","a",null,{"href":"https://stackoverflow.com/users/176569/bpgergo","target":"_blank","rel":"noopener noreferrer","className":"text-lg font-semibold text-blue-600 hover:underline","children":"bpgergo"}],["$","p",null,{"className":"text-sm text-gray-500","children":["Reputation: ",16037]}]]}]]}],["$","p",null,{"className":"text-gray-700 mb-4","dangerouslySetInnerHTML":{"__html":"

Yes, there is, Mahaout has several k-means implementation e.g.: mahout.apache.org/users/clustering/k-means-clustering.html

\n"}}],["$","div",null,{"className":"text-gray-600 text-sm","children":["$","p",null,{"children":["Upvotes: ",1]}]}]]}]]]}],["$","div",null,{"className":"bg-white shadow-md rounded-lg p-6 mt-6","children":[["$","h2",null,{"className":"text-2xl font-semibold text-gray-800 mb-4","children":"Related Questions"}],["$","ul",null,{"className":"list-disc list-inside","children":[["$","li","38988567",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/38988567","className":"text-blue-600 hover:underline","children":"Distributed alternatives to hadoop"}]}],["$","li","49626625",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/49626625","className":"text-blue-600 hover:underline","children":"Hadoop MapReduce without cluster - is it possible?"}]}],["$","li","20256124",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/20256124","className":"text-blue-600 hover:underline","children":"Hadoop - Pseudo-Distributed Operation"}]}],["$","li","21747648",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/21747648","className":"text-blue-600 hover:underline","children":"Hadoop Distributed File system"}]}],["$","li","16261209",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/16261209","className":"text-blue-600 hover:underline","children":"K-Means Algorithm Hadoop"}]}],["$","li","12827935",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/12827935","className":"text-blue-600 hover:underline","children":"Hadoop and multiple clusters"}]}],["$","li","10968084",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/10968084","className":"text-blue-600 hover:underline","children":"Distributed local clustering coefficient algorithm (MapReduce/Hadoop)"}]}],["$","li","22923853",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/22923853","className":"text-blue-600 hover:underline","children":"K-Means calculation on a distributed computation"}]}],["$","li","22250942",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/22250942","className":"text-blue-600 hover:underline","children":"Hadoop Distributed Cluster Machines"}]}],["$","li","12604505",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/12604505","className":"text-blue-600 hover:underline","children":"Kmeans on hadoop"}]}]]}]]}]]}],["$","$L11",null,{}],["$","$L12",null,{}],["$","$L13",null,{}],["$","$L14",null,{}],["$","$L15",null,{}]]

Hadoop distributed version of K-Means?

Answers (2)

Related Questions