## The greatest mathematicians of all time

Even though I truly believe math is sexy, I know putting the term “mathematician” in the title has already cost half of the readers.

For the rest half, take a look at the picture above. How many of these geniuses can you recognize?

If you can name more than half of them, bravo! Please read on: http://fabpedigree.com/james/mathmen.htm

– *Viewer discretion is advised: Once you start reading, it sucks your time like a black hole.* –

## Videos from the paths on Hua Shan

It is an amazing place. You should go there (and come home in one piece).

Here is a wiki page about Hua Shan (华山) and another video:

http://www.youtube.com/watch?v=4nq0vtU-LVc

## Some interesting papers at ICML 2011

Papers accepted at ICML 2011 are now online and available for download. There are a number of exciting papers. Krzysztof went there already on Monday. After he comes back, we will get to know more details of the conference. The talks will probably be online soon as well. I will just stay tuned. Here I list several papers (in no particular order) that I have found very interesting. BTW, I think putting the abstracts online is a good idea. ECML should do this as well, so people can better decide which papers / talks to follow.

**Multi-Label Classification on Tree- and DAG-Structured Hierarchies** by Wei Bi and James Kwok

Abstract: Many real-world applications involve multi-label classification, in which the labels are organized in the form of a tree or directed acyclic graph (DAG). However, current research efforts typically ignore the label dependencies or can only exploit the dependencies in tree-structured hierarchies. In this paper, we present a novel hierarchical multi-label classification algorithm which can be used on both tree- and DAG-structured hierarchies. The key idea is to formulate the search for the optimal consistent multi-label as the finding of the best subgraph in a tree/DAG. Using a simple greedy strategy, the proposed algorithm is computationally efficient, easy to implement, does not suffer from the problem of insufficient/skewed training data in classifier training, and can be readily used on large hierarchies. Theoretical results guarantee the optimality of the obtained solution. Experiments are performed on a large number of functional genomics data sets. The proposed method consistently outperforms the state-of-the-art method on both tree- and DAG-structured hierarchies.

Abstract: In this paper, we study the effect of adding a value function approximation component (critic) to rollout classification-based policy iteration (RCPI) algorithms. The idea is to use a critic to approximate the return after we truncate the rollout trajectories. This allows us to control the bias and variance of the rollout estimates of the action-value function. Therefore, the introduction of a critic can improve the accuracy of the rollout estimates, and as a result, enhance the performance of the RCPI algorithm. We present a new RCPI algorithm, called direct policy iteration with critic (DPI-Critic), and provide its finite-sample analysis when the critic is based on the LSTD method. We empirically evaluate the performance of DPI-Critic and compare it with DPI and LSPI in two benchmark reinforcement learning problems.

Abstract: We address the problem of designing surrogate losses for learning scoring functions in the context of label ranking. We extend to ranking problems a notion of order preserving losses previously introduced for multiclass classification, and show that these losses lead to consistent formulations with respect to a family of ranking evaluation metrics. An order-preserving loss can be tailored for a given evaluation metric by appropriately setting some weights depending on this metric and the observed supervision. These weights, called the standard form of the supervision, do not always exist, but we show that previous consistency results for ranking were proved in special cases where they do. We then evaluate a new pairwise loss consistent with the (Normalized) Discounted Cumulative Gain on benchmark datasets.

**Learning Mallows Models with Pairwise Preferences** by Tyler Lu and Craig Boutilier

Abstract: Learning preference distributions is a key problem in many areas (e.g., recommender systems, IR, social choice). However, existing methods require restrictive data models for evidence about user preferences. We relax these restrictions by considering as data arbitrary pairwise comparisons—the fundamental building blocks of ordinal rankings. We develop the first algorithms for learning Mallows models (and mixtures) with pairwise comparisons. At the heart is a new algorithm, the generalized repeated insertion model (GRIM), for sampling from arbitrary ranking distributions. We develop approximate samplers that are exact for many important special cases—and have provable bounds with pairwise evidence—and derive algorithms for evaluating log-likelihood, learning Mallows mixtures, and non-parametric estimation. Experiments on large, real-world datasets show the effectiveness of our approach.

**Online AUC Maximization** by Peilin Zhao, Steven Hoi, Rong Jin, and Tianbao Yang

Abstract: Most studies of online learning measure the performance of a learner by classification accuracy, which is inappropriate for applications where the data are unevenly distributed among different classes. We address this limitation by developing online learning algorithm for maximizing Area Under the ROC curve (AUC), a metric that is widely used for measuring the classification performance for imbalanced data distributions. The key challenge of online AUC maximization is that it needs to optimize the pairwise loss between two instances from different classes. This is in contrast to the classical setup of online learning where the overall loss is a sum of losses over individual training examples. We address this challenge by exploiting the reservoir sampling technique, and present two algorithms for online AUC maximization with theoretic performance guarantee. Extensive experimental studies confirm the effectiveness and the efficiency of the proposed algorithms for maximizing AUC.

*And of course the one from our group, which is co-authored by Wojciech, Krzysztof and Eyke:*

Abstract: Minimization of the rank loss or, equivalently, maximization of the AUC in bipartite ranking calls for minimizing the number of disagreements between pairs of instances. Since the complexity of this problem is inherently quadratic in the number of training examples, it is tempting to ask how much is actually lost by minimizing a simple univariate loss function, as done by standard classification methods, as a surrogate. In this paper, we first note that minimization of 0/1 loss is not an option, as it may yield an arbitrarily high rank loss. We show, however, that better results can be achieved by means of a weighted (cost-sensitive) version of 0/1 loss. Yet, the real gain is obtained through margin-based loss functions, for which we are able to derive proper bounds, not only for rank risk but, more importantly, also for rank regret. The paper is completed with an experimental study in which we address specific questions raised by our theoretical analysis.

If you are interested in surrogate losses used in the instance ranking problem, it will be a very worth read 🙂

## A repository of datasets for monotone learning

We are announcing a monotone learning data repository. The datasets can be downloaded at our KEBI website:

http://www.uni-marburg.de/fb12/kebi/research/repository/monodata

Monotone learning is an important research task in machine learning community and has a wide range of applications. Often, monotone learners lead to more satisfactory predictive performance; and sometimes monotonicity is even required for applications in medical diagnosis, security system design, etc. For example, a system to predict the patient’s overall wellness should have an outcome that is monotonic with respect to the toxicity measure of that patient. No medical doctor will accept a learning model violating this monotonicity.

In our group, we are working on some new ideas for monotone learning. Surprisingly, during our ongoing research we’ve found, despite a tremendous interest on this topic, there is still no public available dataset collection for monotone learning. We’ve worked hard on collecting some benchmark datasets, and we believe it is beneficial to all researchers interested in this topic that we make our collection publicly available.

This datasets collection is still preliminary. There are a number of points on our todo list. In case you have any comment, or if yourself have a monotone learning dataset that you would like to add, please let us know.