Web Image Prediction Using Multivariate Point Processes

Paper and Matlab example code are available now.

Romanian page is also available. Thanks, Maxim.



  • Gunhee Kim, Li Fei-Fei and Eric P. Xing
    Web Image Prediction Using Multivariate Point Processes
    18th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2012), Beijing, China, August 12-16, 2012. (Acceptance = 133/755 ~ 17.6%)
    [Paper (PDF)] [Presentation (PPTX)]

Matlab example code

  1. Download our Matlab toolbox.

    1. Note that this toolbox was re-written for the instruction purpose.

  2. Please carefully read README.txt in the package before using our code.

  3. Please cite our KDD12 Paper if you use or are inspired by this code/work.

  4. If you find any bugs, please email me.


Motivation of Research

The main research problem of this project is as follows.

Given a query keyword (eg.world+cup) and any future time point, can we predict what images will be likely to be appear on the Web?

The term world+cup mainly refers to the soccer event, and thus soccer images may be reasonable guess. However, as shown in the below figure, actual user photos are extremely diverse (e.g. ski, skate, bicycle, or horse riding), because the term usages vary much according to different users’ experiences and preferences.

As input, we download all images that are queried by the term world+cup from Flickr up to at a certain time point (eg. end of 2008). We also assume that each image is associated with meta-data like timestamp and owner ID. We then learn the temporal model for each topic keyword to describe the relations between image occurrence probabilities and the factors or covariates that influence them. Finally, the learned model is used to predict likely images for a given topic keyword and a future time point.

We call the prediction of photos for arbitrary individuals as collective image prediction. If a particular user is specified at query time, the prediction can be more focused on the user’s unique angle of seeing the topic. We call this task as personalized image prediction.


Figure 1. (a) Given an image stream of the world+cup up to 12/31/2008, can we then predict likely images for a future time point
tq=6/6/2009? (c) Collective image prediction. (d) Personalized prediction for a designated user.


Our unified statistical framework to solve image prediction problem is the multivariate point process, which is a stochastic process that consists of a series of random events occurring at points in time and space. The point process is a powerful statistical model for spatio-temporal events.

Naturally, one occurrence of a particular image at a particular time can be represented by a point in time and image space. Consider an example of a short stream of penguin images. Each image is associated with a timestamp and visual cluster ID that is obtained by image clustering. Then, we can trivially represent this stream of images as the discrete-time trivariate point process like Fig.2.(b). Finally, we formulate regularized Poisson regression model to solve the relations between image occurrence and the covariates that influence it in a flexible, scalable, and globally optimal way.


Figure 2. A multivariate point process for a short penguin image stream. (a) Each image is assigned to a timestamp and one of three
visual clusters, which roughly mean ice hockey, animal penguin, and snowy mountain. (b) Its corresponding discrete-time trivariate point process.

Some examples of collective and personalized prediction are as follows.


Figure 3. Examples of collective (personalized) image prediction for the topics of world+cup and cardinal (fine+art and
Brazilian). In each set, we show the predicted images by our method in the top row and their matched actual images in the bottom row.

Take-home Message

In this paper, we discuss the Web image prediction problem. This research is important because it is an image-based approach for user behavior prediction, and can be easily extended to time-sensitive image reranking.

From experiments, we observe that Web image collections are extremely diverse, but some topics follow predictable patterns. Specifically, our predictive model works well for polysemous topics that show strong annual or periodic trends. We can also conclude that the image based personalization is highly demanding because images can convey more delicate information about user preferences that are hardly captured by text descriptions. For example of fine+art topic, we can effectively address a question like What styles of paintings does a user like?


  • This research is supported by Google, ONR N000140910758, AFOSR FA9550010247, NSF IIS-0713379, and NSF DBI-0640543.

  • Li Fei-Fei is supported by grant from NSF-IIS-1115313.