Data, Knowledge & Life

Weiwei Cheng's blog

My talk at NIPS 2011 workshop

leave a comment »

My NIPS 2011 talk at Choice Model and Preference Learning workshop, “Label Ranking with Abstention: Predicting Partial Orders by Thresholding Probability Distributions” is now available at VideoLectures.net.

作者 Weiwei

24/01/2012 at 14:07

发表在 学术

新闻的暗语

leave a comment »

亲切友好的交谈 —— 字面意思;
坦率交谈 —— 分歧很大,无法沟通;
交换了意见 —— 会谈各说各的,没有达成协议;
充分交换了意见 —— 双方无法达成协议,吵得厉害;
增进了双方的了解 —— 双方分歧很大;
会谈是有益的 —— 双方目标暂时相距甚远,能坐下来谈就很好;
我们持保留态度 —— 我们拒绝同意;
尊重 —— 不完全同意;
赞赏 —— 不尽同意;
遗憾 —— 不满;
不愉快 —— 激烈的冲突;
表示极大的愤慨 —— 现在我拿你没办法;
严重关切 —— 可能要干预;
不能置之不理 —— 即将干涉;
保留做出进一步反应的权利 —— 我们将报复;
我们将重新考虑这一问题的立场 —— 我们已经改变了原来的(友好)政策;
拭目以待 —— 最后警告;
请于X月X日前予以答复 —— X月X日后我们两国可能处于非和平状态;
由此引起的后果将由**负责 —— 可能的话我国将诉诸武力(这也可能是虚张声势的俗语);
这是我们万万不能容忍的 —— 战争在即;
这是不友好的行动 —— 这是敌视我们的行动,可能引起战争的行动;
是可忍孰不可忍 —— 不打算忍了,要动手了;
悬崖勒马 —— 想被XX么?
勿谓言之不预也 —— 我们要亮必杀了!

历史上,小白兔两次祭出必杀技“勿谓言之不预也”,分别是1962年9月22日《人民日报》社论《是可忍,孰不可忍》和1978年12月25日《人民日报》社论《我们的忍耐是有限度的》,对象分别是印度和越南。自韬光养晦发展经济之后还从未在正式场合使用过。

转载于凯迪社区,略有修改。

作者 Weiwei

11/01/2012 at 14:54

发表在 转贴

XRCE is looking for an intern on Bayesian preference learning

leave a comment »

Xerox Research Center Europe is searching for an intern on Bayesian preference learning. It is a great opportunity for anyone interested in doing fantastic research at a fantastic place.

Check the specific here. And have fun in France!

Thank Shengbo for sharing this news with me.

作者 Weiwei

24/11/2011 at 03:48

发表在 学术

My talk at ECMLPKDD 2011

leave a comment »

My ECMLPKDD 2011 talk on “Learning Monotone Nonlinear Models using the Choquet Integral” is now available at VideoLectures.net. Check it out, and let me know what you think.

作者 Weiwei

14/11/2011 at 02:30

发表在 学术

The greatest mathematicians of all time

leave a comment »

Even though I truly believe math is sexy, I know putting the term “mathematician” in the title has already cost half of the readers.

For the rest half, take a look at the picture above. How many of these geniuses can you recognize?

If you can name more than half of them, bravo! Please read on: http://fabpedigree.com/james/mathmen.htm

- Viewer discretion is advised: Once you start reading, it sucks your time like a black hole. -

作者 Weiwei

05/11/2011 at 02:45

发表在 杂话

神州学人2011年第9期

leave a comment »

Weiwei in Heidelberg

神州学人2011年第9期刊登了一篇关于我的文章。详见第38页。

作者 Weiwei

23/09/2011 at 10:39

发表在 杂话

Videos from the paths on Hua Shan

leave a comment »

It is an amazing place. You should go there (and come home in one piece).

Here is a wiki page about Hua Shan (华山) and another video:

http://www.youtube.com/watch?v=4nq0vtU-LVc

作者 Weiwei

11/07/2011 at 22:27

发表在 杂话

Some interesting papers at ICML 2011

with 2 comments

Papers accepted at ICML 2011 are now online and available for download. There are a number of exciting papers. Krzysztof went there already on Monday. After he comes back, we will get to know more details of the conference. The talks will probably be online soon as well. I will just stay tuned. Here I list several papers (in no particular order) that I have found very interesting. BTW, I think putting the abstracts online is a good idea. ECML should do this as well, so people can better decide which papers / talks to follow.

Multi-Label Classification on Tree- and DAG-Structured Hierarchies by Wei Bi and James Kwok

Abstract: Many real-world applications involve multi-label classification, in which the labels are organized in the form of a tree or directed acyclic graph (DAG). However, current research efforts typically ignore the label dependencies or can only exploit the dependencies in tree-structured hierarchies. In this paper, we present a novel hierarchical multi-label classification algorithm which can be used on both tree- and DAG-structured hierarchies. The key idea is to formulate the search for the optimal consistent multi-label as the finding of the best subgraph in a tree/DAG. Using a simple greedy strategy, the proposed algorithm is computationally efficient, easy to implement, does not suffer from the problem of insufficient/skewed training data in classifier training, and can be readily used on large hierarchies. Theoretical results guarantee the optimality of the obtained solution. Experiments are performed on a large number of functional genomics data sets. The proposed method consistently outperforms the state-of-the-art method on both tree- and DAG-structured hierarchies.

Classification-Based Policy Iteration with a Critic by Victor Gabillon, Alessandro Lazaric, Mohammad Ghavamzadeh, and Bruno Scherrer

Abstract: In this paper, we study the effect of adding a value function approximation component (critic) to rollout classification-based policy iteration (RCPI) algorithms. The idea is to use a critic to approximate the return after we truncate the rollout trajectories. This allows us to control the bias and variance of the rollout estimates of the action-value function. Therefore, the introduction of a critic can improve the accuracy of the rollout estimates, and as a result, enhance the performance of the RCPI algorithm. We present a new RCPI algorithm, called direct policy iteration with critic (DPI-Critic), and provide its finite-sample analysis when the critic is based on the LSTD method. We empirically evaluate the performance of DPI-Critic and compare it with DPI and LSPI in two benchmark reinforcement learning problems.

Learning Scoring Functions with Order-Preserving Losses and Standardized Supervision by David Buffoni, Clément Calauzenes, Patrick Gallinari, and Nicolas Usunier

Abstract: We address the problem of designing surrogate losses for learning scoring functions in the context of label ranking. We extend to ranking problems a notion of order preserving losses previously introduced for multiclass classification, and show that these losses lead to consistent formulations with respect to a family of ranking evaluation metrics. An order-preserving loss can be tailored for a given evaluation metric by appropriately setting some weights depending on this metric and the observed supervision. These weights, called the standard form of the supervision, do not always exist, but we show that previous consistency results for ranking were proved in special cases where they do. We then evaluate a new pairwise loss consistent with the (Normalized) Discounted Cumulative Gain on benchmark datasets.

Learning Mallows Models with Pairwise Preferences by Tyler Lu and Craig Boutilier

Abstract: Learning preference distributions is a key problem in many areas (e.g., recommender systems, IR, social choice). However, existing methods require restrictive data models for evidence about user preferences. We relax these restrictions by considering as data arbitrary pairwise comparisons—the fundamental building blocks of ordinal rankings. We develop the first algorithms for learning Mallows models (and mixtures) with pairwise comparisons. At the heart is a new algorithm, the generalized repeated insertion model (GRIM), for sampling from arbitrary ranking distributions. We develop approximate samplers that are exact for many important special cases—and have provable bounds with pairwise evidence—and derive algorithms for evaluating log-likelihood, learning Mallows mixtures, and non-parametric estimation. Experiments on large, real-world datasets show the effectiveness of our approach.

Online AUC Maximization by Peilin Zhao, Steven Hoi, Rong Jin, and Tianbao Yang

Abstract: Most studies of online learning measure the performance of a learner by classification accuracy, which is inappropriate for applications where the data are unevenly distributed among different classes. We address this limitation by developing online learning algorithm for maximizing Area Under the ROC curve (AUC), a metric that is widely used for measuring the classification performance for imbalanced data distributions. The key challenge of online AUC maximization is that it needs to optimize the pairwise loss between two instances from different classes. This is in contrast to the classical setup of online learning where the overall loss is a sum of losses over individual training examples. We address this challenge by exploiting the reservoir sampling technique, and present two algorithms for online AUC maximization with theoretic performance guarantee. Extensive experimental studies confirm the effectiveness and the efficiency of the proposed algorithms for maximizing AUC.

And of course the one from our group, which is co-authored by Wojciech, Krzysztof and Eyke:

Bipartite Ranking through Minimization of Univariate Loss by Wojciech Kotlowski, Krzysztof Dembczynski and Eyke Hüllermeier

Abstract: Minimization of the rank loss or, equivalently, maximization of the AUC in bipartite ranking calls for minimizing the number of disagreements between pairs of instances. Since the complexity of this problem is inherently quadratic in the number of training examples, it is tempting to ask how much is actually lost by minimizing a simple univariate loss function, as done by standard classification methods, as a surrogate. In this paper, we first note that minimization of 0/1 loss is not an option, as it may yield an arbitrarily high rank loss. We show, however, that better results can be achieved by means of a weighted (cost-sensitive) version of 0/1 loss. Yet, the real gain is obtained through margin-based loss functions, for which we are able to derive proper bounds, not only for rank risk but, more importantly, also for rank regret. The paper is completed with an experimental study in which we address specific questions raised by our theoretical analysis.

If you are interested in surrogate losses used in the instance ranking problem, it will be a very worth read :)

作者 Weiwei

27/06/2011 at 18:20

发表在 学术

A repository of datasets for monotone learning

leave a comment »

We are announcing a monotone learning data repository. The datasets can be downloaded at our KEBI website:

http://www.uni-marburg.de/fb12/kebi/research/repository/monodata

Monotone learning is an important research task in machine learning community and has a wide range of applications. Often, monotone learners lead to more satisfactory predictive performance; and sometimes monotonicity is even required for applications in medical diagnosis, security system design, etc. For example, a system to predict the patient’s overall wellness should have an outcome that is monotonic with respect to the toxicity measure of that patient. No medical doctor will accept a learning model violating this monotonicity.

In our group, we are working on some new ideas for monotone learning. Surprisingly, during our ongoing research we’ve found, despite a tremendous interest on this topic, there is still no public available dataset collection for monotone learning. We’ve worked hard on collecting some benchmark datasets, and we believe it is beneficial to all researchers interested in this topic that we make our collection publicly available.

This datasets collection is still preliminary. There are a number of points on our todo list. In case you have any comment, or if yourself have a monotone learning dataset that you would like to add, please let us know.

作者 Weiwei

05/05/2011 at 00:23

发表在 学术

Java 4-Ever Trailer

leave a comment »

Can you handel this much geekness?


作者 Weiwei

25/03/2011 at 00:43

发表在 杂话

加关注

Get every new post delivered to your Inbox.