A large part of a data scientist’s work involves tuning hyperparameters—making decisions about what happens to the data during all different steps of the process. Such tuning takes a significant amount of time and effort, and small differences in hyperparameters can significantly affect the performance of a particular pipeline. Through our system called “Deep Mining,” we seek to automatically tune the entire data processing pipeline—not just the classification algorithms as in the previous project – ATM. This involves standardizing the pipeline abstractions, and building and testing several hyperparameter selection and optimization methods. Since the earlier stages of the pipeline can be computationally expensive, our focus is also on determining the most efficient distribution strategy, and a sampling-based performance estimator.