Class KMeansClusterer<T>


  • public class KMeansClusterer<T>
    extends java.lang.Object
    Groups items into a specified number of clusters, based on their proximity in d-dimensional space, using the k-means algorithm. Calls to cluster will terminate when either of the two following conditions is true:
    • the number of iterations is > max_iterations
    • none of the centroids has moved as much as convergence_threshold since the previous iteration
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      static class  KMeansClusterer.NotEnoughClustersException
      An exception that indicates that the specified data points cannot be clustered into the number of clusters requested by the user.
    • Constructor Summary

      Constructors 
      Constructor Description
      KMeansClusterer()
      Creates an instance with max iterations of 100 and convergence threshold of 0.001.
      KMeansClusterer​(int max_iterations, double convergence_threshold)
      Creates an instance whose termination conditions are set according to the parameters.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      protected java.util.Map<double[],​java.util.Map<T,​double[]>> assignToClusters​(java.util.Map<T,​double[]> object_locations, java.util.Set<double[]> centroids)
      Assigns each object to the cluster whose centroid is closest to the object.
      java.util.Collection<java.util.Map<T,​double[]>> cluster​(java.util.Map<T,​double[]> object_locations, int num_clusters)
      Returns a Collection of clusters, where each cluster is represented as a Map of Objects to locations in d-dimensional space.
      double getConvergenceThreshold()
      Returns the convergence threshold.
      int getMaxIterations()
      Returns the maximum number of iterations.
      void setConvergenceThreshold​(double convergence_threshold)
      Sets the convergence threshold.
      void setMaxIterations​(int max_iterations)
      Sets the maximum number of iterations.
      void setSeed​(int random_seed)
      Sets the seed used by the internal random number generator.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • max_iterations

        protected int max_iterations
      • convergence_threshold

        protected double convergence_threshold
      • rand

        protected java.util.Random rand
    • Constructor Detail

      • KMeansClusterer

        public KMeansClusterer​(int max_iterations,
                               double convergence_threshold)
        Creates an instance whose termination conditions are set according to the parameters.
      • KMeansClusterer

        public KMeansClusterer()
        Creates an instance with max iterations of 100 and convergence threshold of 0.001.
    • Method Detail

      • getMaxIterations

        public int getMaxIterations()
        Returns the maximum number of iterations.
      • setMaxIterations

        public void setMaxIterations​(int max_iterations)
        Sets the maximum number of iterations.
      • getConvergenceThreshold

        public double getConvergenceThreshold()
        Returns the convergence threshold.
      • setConvergenceThreshold

        public void setConvergenceThreshold​(double convergence_threshold)
        Sets the convergence threshold.
        Parameters:
        convergence_threshold -
      • cluster

        public java.util.Collection<java.util.Map<T,​double[]>> cluster​(java.util.Map<T,​double[]> object_locations,
                                                                             int num_clusters)
        Returns a Collection of clusters, where each cluster is represented as a Map of Objects to locations in d-dimensional space.
        Parameters:
        object_locations - a map of the Objects to cluster, to double arrays that specify their locations in d-dimensional space.
        num_clusters - the number of clusters to create
        Throws:
        KMeansClusterer.NotEnoughClustersException
      • assignToClusters

        protected java.util.Map<double[],​java.util.Map<T,​double[]>> assignToClusters​(java.util.Map<T,​double[]> object_locations,
                                                                                                 java.util.Set<double[]> centroids)
        Assigns each object to the cluster whose centroid is closest to the object.
        Parameters:
        object_locations - a map of objects to locations
        centroids - the centroids of the clusters to be formed
        Returns:
        a map of objects to assigned clusters
      • setSeed

        public void setSeed​(int random_seed)
        Sets the seed used by the internal random number generator. Enables consistent outputs.