Executive Summary
Aiming to be “the Earth’s most customer-centric company,” Amazon.com is engaged in an ongoing effort to refine its online personalization capabilities. Chief among these is the ability to offer accurate product recommendations based on comparing a customer’s browsing and purchasing profile with other customers' profiles. Amazon is not alone in this endeavor; an entire industry has emerged to aid firms that hope to match customers with the “right” product. Thus, “product recommendation systems” have emerged as backbones of some of the Internet economy’s largest and most venerable firms. They rely on the “power of many,” leveraging data on prior customers’ product ratings to generate suggestions for current customers, typically in real time. A key feature of recommendation system data is the exceptionally large proportion of missing values; few customers rate more than a handful of items. To date, all statistical models for online recommendations have taken the missing rating data to be missing completely at random, implicitly assuming that the missing data lack meaningful patterns or that any such patterns can be ignored in improving ratings quality.
For the EachMovie data, which has been widely used in prior studies, the authors find that missing data are strongly nonignorable. Recommendation quality can be improved substantially through a joint model for both “selection” and “ratings,” that is, whether an item is rated and how it is rated. Indeed, the authors find that in various holdout samples, errors in predicted ratings can often be reduced by 10% or more by carefully modeling missing data. This improvement in recommendation quality is evident across both restricted variants of the proposed model and alternative models proposed in the literature. They find that qualities that may dispose a movie to be rated at all can systematically differ from those which dispose it to be rated well; for example, although “classic” movies are seldom rated, they are nevertheless rated highly, and “action” movies display the opposite pattern. Furthermore, the authors find that such relationships can vary with customer demographics.
Empirical results demonstrate that four modeling constructs—a nonignorable missing data mechanism, an individual-level account of the ordinal nature of ratings data, a reasonably sophisticated heterogeneity specification, and correlation between the underlying selection and ratings generation processes—can jointly substantially improve the accuracy of making product recommendations. Various comparisons show that the full model, including selection and prediction components in addition to heterogeneity, consistently outperforms alternative models in the quality of its recommendations.
Besides the ability to accommodate customers’ decisions to provide an evaluation for a specific product, the model in this study offers other advantages. Of particular importance is speed; although estimation can be complex, recommendations follow from estimated parameters almost instantaneously. As such, models accommodating missing data can be estimated periodically but used intensively in real time as customers browse and seek information.
Biography
Yuanping Ying is Assistant Professor of Marketing in the School of Management at the University of Texas, Dallas. She obtained her PhD from the Ross School of Business at the University of Michigan. Her primary research interests are modeling consumer behavior in the online environments and applications of Bayesian statistics in marketing.
Fred Feinberg is Hallman Fellow and Bank One Corporation Associate Professor of Marketing in the Ross School of Business at the University of Michigan. He previously taught at the University of Toronto and Duke University, and he completed his PhD at the Massachusetts Institute of Technology’s Sloan School of Management. He has worked primarily on control-theoretic models of advertising response and econometric models of individual-level choice processes, with a particular recent emphasis on categorical data models, ranking, and Bayesian analysis. Examples of his work appear in Journal of Marketing Research, Management Science, and Marketing Science, where he serves on the editorial board, and he is senior editor for marketing at POMS.
Michel Wedel earned his MSc in Biomathematics from the University of Leiden (1981), his MSc in Statistics from the Netherlands Statistical Society (1986), and his PhD in Marketing from Wageningen University (1990). He is the Dwight F. Benton Professor of Marketing in the Ross School of Business at the University of Michigan and is Honorary Professor of Marketing at the University of Groningen. His main research interests are in marketing research methodology and the application of statistical and econometric methods to marketing problems. His work has appeared in International Journal for Research in Marketing, Journal of Econometrics, Journal of Applied Econometrics, Journal of Classification, Journal of Business and Economic Statistics, Journal of Consumer Research, Journal of Marketing Research, Journal of Marketing, Marketing Letters, Management Science, Marketing Science, and Psychometrika, among others. Professor Wedel serves on the editorial boards of International Journal for Research in Marketing, Journal of Marketing Research, and Journal of Marketing, and he is area editor for Marketing Science.
J Marketing Research, Volume 43, Number 3, August 2006
View Table of Contents.