Selected publications of Balázs Kégl

Boosting

    B. Kégl
    Open problem: A (missing) boosting-type convergence result for AdaBoost.MH with factorized multi-class classifiers

    In Conference on Computational Learning Theory (open problems track), 2014.

    In this companion paper we showed that AdaBoost.MH works very well in practice with factorized base classifiers. This note states the open problem of the exponential convergence of the algorithm.

    B. Kégl
    The return of AdaBoost.MH: multi-class Hamming trees

    In International Conference on Learning Representations, 2014.

    We train vector-valued decision trees within the framework of AdaBoost.MH. The key element of the method is a vector-valued decision stump, factorized into an input-independent vector of length K and label-independent scalar classifier.

    I. Hegedüs, R. Busa-Fekete, R. Ormándi, M. Jelasity, and B. Kégl
    Peer-to-peer multi-class boosting

    In International European Conference on Parallel and Distributed Computing, pages 389–400, June 2012.

    D. Benbouzid, R. Busa-Fekete, and B. Kégl
    Fast classification using sparse decision DAGs

    In International Conference on Machine Learning, Volume 29, June 2012.

    We build sparse decision DAGs (directed acyclic graphs) from a list of base classifiers using an MDP. Each instance can decide to use or to skip each base classifier, based on the current state of the classifier being built. The algorithm clearly outperforms state-of-the-art cascade detectors. It is also readily applicable for multi-class classification.

    D. Benbouzid, R. Busa-Fekete, N. Casagrande, F.-D. Collin, and B. Kégl
    MultiBoost: a multi-purpose boosting package

    Journal of Machine Learning Research, 13:549–553, 2012.

    The MultiBoost package provides a fast C++ implementation of multi-class/multi-label/multi-task boosting algorithms. It is based on AdaBoost.MH but it also implements popular cascade classifiers and FilterBoost. The package contains common multi-class base learners (stumps, trees, products, Haar filters). Further base learners and strong learners following the boosting paradigm can be easily implemented in a flexible framework.

    R. Busa-Fekete and B. Kégl
    Fast boosting using adversarial bandits

    In International Conference on Machine Learning, Volume 27, pages 143–150, June 2010.

    We improved on our previous bandit boosting paper by using an adversarial bandit algorithm instead of stochastic bandits. The practical performance is comparable but this choice allows us to prove a weak-to-strong-learning theorem, which means that the proposed technique remains a boosting algorithm in a formal sense.

    R. Busa-Fekete and B. Kégl
    Accelerating AdaBoost using UCB

    In KDDCup 2009 (JMLR workshop and conference proceedings), Volume 7, pages 111–122, 2009.

    We show how to accelerate AdaBoost using multi-armed bandits (MABs). The basic idea is that we gradually learn the utility of the features (or, in general, subsets of base learners) using MABs, and use this utility to bias the random selection in each boosting iteration.

    B. Kégl and R. Busa-Fekete
    Boosting products of base classifiers

    In International Conference on Machine Learning, Volume 26, pages 497–504, June 2009.

    We show how to boost products of base learners. Similarly to trees, we call the base learner as a subroutine but in an iterative rather than recursive fashion. In experiments on relatively large data sets we clearly outperform boosted trees. We also propose an improved base learner for nominal features and show that boosting the product of two of these new subset indicator base learners solves the maximum margin matrix factorization problem.

    S. Gambs, B. Kégl, and E. Aımeur
    Privacy-preserving boosting

    Data Mining and Knowledge Discovery, 14(1):131–170, 2007.

    How to AdaBoost when the data set is divided among two or more participants who don't want to share possibly sensitive details on their data.

    B. Kégl and L. Wang
    Boosting on manifolds: adaptive regularization of base classifiers

    In Advances in Neural Information Processing Systems, Volume 17, pages 665–672. The MIT Press, Dec. 2004.

    How to use additive regularization penalty at the base-learner level, and what the implied global objective function is.

    B. Kégl
    Generalization error and algorithmic convergence of median boosting

    In Advances in Neural Information Processing Systems, Volume 17, pages 657–664. The MIT Press, Dec. 2004.

    B. Kégl
    Robust regression by boosting the median

    In 16th Conference on Computational Learning Theory, pages 258–272, Aug. 2003.

    B. Kégl
    Confidence-rated regression by boosting the median

    Technical Report 1241, Department of Computer Science, University of Montreal, 2003.

    This TR contains all the detailed information on MedBoost published in the two conference papers above. It is interesting that the "natural" extension of AdaBoost to regression implies that one should use the weighted median of the base regressors instead of the weighted mean usually used in additive models. By "natural" I mean that you can prove a weak-learning rightarrow strong-learning theorem. While this was already noted in Freund and Schapire's original AdaBoost paper, the original algorithm is not practical without using Rätsch and Warmuth's marginal boosting idea.

Ranking

    R. Busa-Fekete, B. Kégl, T. Éltető, and Gy. Szarvas
    Tune and mix: learning to rank using ensembles of calibrated multi-class classifiers

    Machine Learning Journal, 93(2):261–292, 2013.

    We show that a point-wise ranking approach (based on multi-class classification, calibration, and mixing models) is competitive to more complex pairwise and listwise methods, especially on large sets. Our extensive experimental study is itself an important contribution: we compare most of the existing learning-to-rank techniques on all of the available large-scale benchmark data sets using a standardized implementation of the NDCG score.

    R. Busa-Fekete, B. Kégl, T. Éltető, and Gy. Szarvas
    An apple-to-apple comparison of learning-to-rank algorithms in terms of Normalized Discounted Cumulative Gain

    In ECAI Workshop on Preference Learning, August 2012.

    R. Busa-Fekete, B. Kégl, T. Éltető, and Gy. Szarvas
    A robust ranking methodology based on diverse calibration of AdaBoost

    In European Conference on Machine Learning, Volume 22, pages 263–279, 2011.

    We show that a simple boosting-based pointwise ensemble approach with a slight listwise touch at the last combination step is competitive to sophisticated (and computationally expensive) ranking techniques.

    R. Busa-Fekete, B. Kégl, T. Éltető, and Gy. Szarvas
    Ranking by calibrated AdaBoost

    In Yahoo! Ranking Challenge 2010 (JMLR workshop and conference proceedings), Volume 14, pages 37–48, 2011.

    Our boosting-based pointwise ensemble approach ended up 6th in Track 1 and 11th in Track 2.

Adaptive optimization and MCMC

    A. Roodaki, R. Bardenet, O. Cappé, and B. Kégl
    Detection and estimation of high energy physical particles using Monte Carlo methods

    In GRETSI, Brest, France, September 2013.

    R. Bardenet, M. Brendel, B. Kégl, and M. Sebag
    Collaborative hyperparameter tuning

    In International Conference on Machine Learning, June 2013.

    We simultaneously tune hyperparameters of a given algorithm on several data sets with a surrogate-based ranking and optimization technique.

    R. Bardenet, O. Cappé, G. Fort, and B. Kégl
    Adaptive Metropolis with online relabeling

    In International Conference on Artificial Intelligence and Statistics, Volume 22, pages 91–99, Apr. 2012.

    We propose a novel adaptive MCMC algorithm named AMOR for efficiently simulating from permutation-invariant targets occurring in, for example, Bayesian analysis of mixture models.

    R. Bardenet and B. Kégl
    An adaptive Monte-Carlo Markov chain algorithm for inference from mixture signals

    Journal of Physics: Conference Series, 368(1):012044, 2012.

    We describe adaptive Metropolis with online relabeling (AMOR) for modeling mixture signals and illustrate the algorithm on a synthetic mixture model inspired by the muonic water Cherenkov signal of the surface detectors in the Pierre Auger Experiment.

    J. Bergstra, R. Bardenet, Y. Bengio, and B. Kégl
    Algorithms for hyper-parameter optimization

    In Advances in Neural Information Processing Systems, Volume 24. The MIT Press, Dec. 2011.

    We optimize hyper-parameters using random search and two new greedy sequential methods based on the expected improvement criterion, and compare the methods on tasks of training neural networks and deep belief networks.

    R. Bardenet and B. Kégl
    Surrogating the surrogate: accelerating Gaussian-process-based global optimization with a mixture cross-entropy algorithm

    In International Conference on Machine Learning, Volume 27, pages 55–62, June 2010.

    We propose a mixture cross-entropy optimizer and use it for merit function optimization in a Gaussian-process surrogate optmization framework. We outperform the classical single-Gaussian cross-entropy method when the fitness function is highly multimodal, and we improve on standard exhaustive search in GP-based surrogate optimization.

Unsupervised learning

    B. Kégl
    Correlation-based construction of neighborhood and edge features

    In International Conference on Learning Representations (workshop track), 2014.

    Motivated by an abstract notion of low-level edge detector filters, we propose a simple method of unsupervised feature construction based on pairwise statistics of features. In the first step, we construct neighborhoods of features by regrouping features that correlate. Then we use these subsets as filters to produce new neighborhood features. Next, we connect neighborhood features that correlate, and construct edge features by subtracting the correlated neighborhood features of each other.

    A. N. Gorban, B. Kégl, D. C. Wunsch, and A. Zinovyev, editors
    Principal Manifolds for Data Visualization and Dimension Reduction

    Volume 58 of Lecture Notes in Computational Science and Engineering
    Springer, 2008.

    Proceedings of a small workshop organized by Alexander Gorban. The book has a Russian web site that used to come up as the third hit if you type главных компонент ("principal component" in Russian) into Google.

    N. Le Roux, Y. Bengio, P. Lamblin, Joliveau M., and B. Kégl
    Learning the 2-D topology of images

    In Advances in Neural Information Processing Systems, Volume 20, pages 841–848. The MIT Press, Dec. 2007.

    Take a set of images and shuffle their pixels in an arbitrary way but using the same permutation for all the images. The two-dimensional pixel topology can be reconstructed by looking at the pairwise correlations of pixels across the set, using only a few thousand pictures.

    B. Kégl
    Intrinsic dimension estimation using packing numbers

    In Advances in Neural Information Processing Systems, Volume 15, pages 681–688. The MIT Press, Dec. 2002.

    A simple algorithm to measure the degrees of freedom in a multivariate data set. I have a web site with a java software. Pretty high impact considering that it's "only" a conference paper.

    B. Kégl, A. Krzyżak, T. Linder, and K. Zeger
    Learning and design of principal curves

    IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(3):281–297, 2000.

    Algorithm and theory for estimating curves that go through the "middle" of a multivariate "cloud" of points. Arguably my most significant research contribution in terms of impact. I have another paper on the subject in the Applications/Image processing section, and three conference papers (not listed) that overlap with the papers and my thesis. I have a web page dedicated to the subject with a java software that works on most of the platforms.

    B. Kégl
    Principal Curves: Learning, Design, and Applications

    PhD thesis, Concordia University, Montreal, Canada, 1999.

Learning theory

    A. Antos, B. Kégl, T. Linder, and G. Lugosi
    Data-dependent margin-based generalization bounds for classification

    Journal of Machine Learning Research, 3:73–98, 2003.

    A. Krzyżak, J. S c asiadek, and B. Kégl
    Non-parametric identification of dynamic non-linear systems by a Hermite series approach

    International Journal of Systems Science, 32(10):1261–1285, 2001.

    B. Kégl, A. Krzyżak, and H. Niemann
    Radial basis function networks and complexity regularization in function learning and classification

    In International Conference on Pattern Recognition, pages 2/81–86, Sept. 2000.

Machine learning applications

Music processing

Systems, software engineering

    J. Perez, C. Germain-Renaud, B. Kégl, and C. Loomis
    Multiobjective reinforcement learning for responsive grids

    Journal of Grid Computing, 8(3), 2010.

    J. Perez, C. Germain-Renaud, B. Kégl, and C. Loomis
    Responsive elastic computing

    In IEEE International Conference on Autonomic Computing, Volume 6, pages 55–64, June 2009.

    J. Perez, C. Germain-Renaud, B. Kégl, and C. Loomis
    Utility-based reinforcement learning for reactive grids

    In IEEE International Conference on Autonomic Computing, Volume 5, pages 205–206, June 2008.

    S. Bouktif, H. Sahraoui, B. Kégl, D. Azar, and D. Precup
    Improving rule set based software quality prediction: A genetic algorithm based approach

    Journal of Object Technology, 3(4):227–241, 2004.

    D. Azar, S. Bouktif, B. Kégl, H. Sahraoui, and D. Precup
    Combining and adapting software quality predictive models by genetic algorithms

    In 17th IEEE International Conference on Automated Software Engineering, pages 285–288, Sept. 2002.

    S. Bouktif, B. Kégl, and H. Sahraoui
    Combining software quality predictive models: an evolutionary approach

    In IEEE International Conference on Software Maintenance, pages 385–392, Oct. 2002.

Image processing

Experimental physics

Auger papers