where n j = number of chemicals in cluster j, C j , i is the centroid (or average value) for descriptor i for cluster j , and d is the number of descriptors in the EPA pool of descriptors (~800) ( 15 ). The process of combining clusters while minimizing variance continues until all of the chemicals are lumped into a single cluster. After the clustering is complete, each cluster is analyzed to determine if an acceptable QSAR model can be developed. A genetic algorithm technique is used to select descriptors to build a multi-linear regression model for each cluster ( 15 ). Similar to the k NN approach, each model must achieve a LOO-CV accuracy of to be used in making predictions. The predicted value for a given test chemical is calculated using the equally weighted average of the model predictions from the closest cluster from each step in the hierarchical clustering. This method was previously shown to yield the best results for another acute toxicity endpoint, IGC 50 (50% inhibitory concentration of population growth) of Tetrahymena pyriformis ( 15 ).