On Multiple Foreground Cosegmentation

Paper, FlickrMFC Dataset, and Matlab toolbox are now available.

People

Gunhee Kim
Eric P. Xing (CMU)

Publication

Gunhee Kim and Eric P. Xing
On Multiple Foreground Cosegmentation
25th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2012). Rhode Island, USA, June 16-21, 2012. (Oral) (Acceptance = 48/1933 ~ 2.48%)
[Paper (PDF)] [Supplementary (PDF)] [Presentation (PPTX)] [Talk (Link)]

Matlab code

Download our Matlab toolbox.
1. Note that this toolbox was re-written for the instruction purpose.
2. The code includes very primitive Foreground models: (1) GMM and (2) SPM with a single level. They are much simpler than those I used in the paper.
3. We try to naively follow the algorithm presented in the paper, without including any sophisticated features that were used for experiments.
4. (Last update: 07/20/2012) Please note that this beta toolbox may be frequently updated for a while.
Please carefully read our README.txt before using our code.
Please cite our CVPR12 Paper if you use or are inspired by this code/work.
If you find any bugs, please email me.

FlickrMFC Dataset

Download FlickrMFC dataset (FlickrMFCdataset.zip).
1. Last update is July 13, 2012.
2. You can see the usage instruction and the examples of images and ground truths in the README.pdf.

Figure 3. Examples of apple+picking and cow+pasture groups in FlickrMFC dataset.

Description

Motivation of Research

This project aims at cosgementing a general user's photo stream. The following figure shows sampled images from an apple+picking photo stream of Flickr, which follow an ordinary user's photo-taking pattern.

A series of photos are taken for a specific moment.
A number of foregrounds (i.e., subjects of interest) are finite.
Each image contains an unknown subset of foregrounds.

For example, two girls (in red and blue), one baby, and apple buckets repeatedly but irregularly appear in the photo stream. Unfortunately, such a content-misaligned set of images would not be explicitly addressed yet by existing cosegmentation algorithms.

Figure 1. Motivation for multiple foreground cosegmentation.

Method

Our approach alternates between two subtasks: foreground modeling, and region assignment. The foreground modeling step learns the appearance models of K foregrounds and the background, which can be accomplished by using any existing region classifiers or their combinations. Our major technical novelty lies in the region assignment step, which is formulated as welfare maximization in combinatorial auction. We first oversegment each image into a set of multiple small segments. In analogy, given a set of segments (items), the foreground models (bidders or buyers) submit a set of foreground candidates (package bids) that they want to take. Then, a small number of feasible foreground candidates are chosen among all submitted ones in order to maximize the overall values. The general welfare maximization (or winner determination) problem is NP-complete and inapproximable. In this paper, we propose a tractable method by leveraging structural properties that are commonly observed in the image space. Please see the details in the paper or CVPR oral slides.

Some segmentation examples of Flickr and ImageNet datasets are as follows.

Figure 2. Examples of multiple foreground cosegmentation on Flickr and ImageNet datasets.

Take-home Message

We believe that this mid-level combinatorial optimization is very promising for image segmentation. Instead of relying on complicated ML algorithms or high-order models, we can just enumerate the cases (or hypotheses) of best segmentation not in a brute-force but in a smart way, by using state-of-art region descriptors and classifiers. Our previous paper in ICCV 2011, based on submodular optimization, lies in the same line of thought.

Funding

This research is supported by Google, NSF IIS-0713379, and NSF DBI-0640543.