Posts tagged: ranking

Challenges of Collaborative Filtering

Previously Vincent wrote about collaborative filtering here on Tech It Easy and made a really good business case on the topic of user-generated content (UGC) versus Expert input. Here, I’ll go a bit more deep into the ways collaborative filtering is done and what are the challenges.

For simplicity, I have divided the ways to filter in two. There’s the Pandora way, where the approach is that a song can be explained by about 150 different genes and recommendations are (in very simple terms) other songs in the neighborhood in that multi-dimensional space. To accurately achieve this, they use expert opinions. Then there’s the Amazon, Last.fm, Netflix et al. way of clustering users with similar histories and recommending what other people in that cluster have liked.

The huge difference in these approaches is best illustrated by the fact that for the Pandora way to work, you don’t actually need any users. The expert’s role in the latter way is to somehow come up with a way to model these clusters accurately.

The latter is much more interesting, because it’s always a challenge to infer anything from user data. The Pandora way’s “only” major challenge is the assumption that people like similar things (ie. how big the searched neighborhood should be).

The other main reason for interest in the Amazon/Netflix way is, of course, money. The $ 1,000,000 Netflix Prize is, simply put, a hunt for a certain RMSE (root mean squared error). When described this way, some interesting questions arise.

 

For the record, I liked Napoleon Dynamite

One question I think is important is what’s the theoretical limit for accuracy in Netflix’s case. In other words, let’s assume that all users at Netflix rate fully rationally all and only the movies they have seen on a cardinal scale. That’s a pretty heavy load of assumptions and I’m pretty sure that’s not all. That’s why even though Netflix could accurately forecast the data it wouldn’t mean it mirrors users’ true preferences. So, what actually is this upper limit on accuracy, or lower limit on RMSE, in Netflix’s case is a good question.

 

For these reasons it shouldn’t be surprising that “just a guy in a garage”, a psychologist employing behavioral decision making assumptions instead of hard rationality, could get so good scores in the Netflix Prize. A pretty good story on that was in Wired a while ago.

For the reasons above, it’s also pretty backwards to think that the problem is fitting the data into the algorithm, so I wouldn’t really call it a “Napoleon Dynamite problem” as NY Times did recently. But do note, that the “Pragmatic Theory” team interviewed in this article, just like “just a guy in a garage”, didn’t actually invent anything new, they just realized to use a method didn’t know or had forgotten about, in this case singular value decomposition. One such method is the Principal Component Analysis, which is available in pretty much any statistical software package available (no, Excel doesn’t count) (and yes, could think Pandora way as something similar to Factor analysis).

One difficulty in Netflix’s case it pretty much boils down to what’s in a number. Remember that in this case the teams work only on user rating data, but they are of course free to add more data from other sources as well. This doesn’t change the fact that the only user data they have are user’s ratings.

As a sidenote, I guess that one reason demographics aren’t used is legal issues. Vince pointed that things like the “Napoleon Dynamite problem” could be solved with more data like demographics and mood. Now, usually more data means just more problems, but let’s forget about that for this discussion.

On this topic, I recently listened to a really interesting lecture about modern consumer analysis by Petri Vasara from Pöyry consulting. They had come up with neat tool, ConsuNaut (PDF) to show what certain segments are doing at what times (comparing to the old “your target audience watches TV x hours day” way) and what was their mood etc. One “press release friendly” finding of this tool is that the Global Rush Hour, or when most of the world’s people are commuting, is at 18-19 Finnish time (UTC+2).

Anyway, back to the topic. What I also see as a problem is the actual “forecasting” part. Now, this doesn’t affect Netflix that much, because I assume that it is in their interest to get customers rent whatever movies, even – using the out-of-fashion term – “long tail”. Even more so if there are inventory costs involved. What happens when a new movie enters the pool? Remember, that for clustering to work, there has to be data, which is pretty sparse for a new movie. How long does it take for new movie’s recommendations to be accurate and how does it affect other recommendations?

In other words, how stable is the solution for the problem? How does seeing the latest James Bond, because everyone goes to see that, change the recommendations to someone who doesn’t like other action movies? Is he recommended Transporter 2? Is fan of Pixar movies offered Disney’s children’s animations, or worse yet, DreamWorks’ animations?

Not Madagascar 2

So, while Netflix way is about fitting data and finding clusters, Pandora bases it assumption on the idea that all music can be labeled accurately and objectively. The main criticism against this approach in my opinion is the post-modern philosophy of subjectiveness. Is there really one truth? (Also, how many genes does it need?)

I was attending a guest lecture by Andrzej P. Wierzbicki on “The Problem of Objective Ranking: Foundations, Approaches and Applications”, where he, for example, discussed the “dangers and errors of the subjectivist reduction of objectivity to power and money”. So he was painting with a broad brush, but there were lots of gems. He also noted that intersubjective rational ranking is difficult and full objectivity is impossible, which should demotivate the Pandora crowd a little.

So, what might at surface look like a statistical challenge is deep down much more cross-disciplinary and it goes all the way to our assumptions of reality. This is why it is important to keep in mind the most important thing, the end of all this – the business angle. It is not Netflix’s or Pandora’s interest to 100% accurately predict anything, they only need to do it well enough. Well, not Netflix’s anyway. The whole reason for improving Cinematch is purely economical, they have found out that people actually rent more if the recommendations are good (enough). There’s a reason they’re offering one million dollars for 10% improvement. I’d love to know how quickly that million pays itself back.

And, really, let’s face it. Most of the collaborative filtering things today are just toys so none of this really matters. There’s a lot of assumptions and approximations and the results are good enough for the purpose. For example, iTunes’ Genius is certainly flawed and limited, but it’s way better than normal random or shuffle play. But if you want to go that extra mile, then you see that the challenge gets exponentially more difficult.

To top it off, in the end there’s the age old problem of optimization, which is that on average, the solutions are “good”, but not “interesting” and definitely not varied. But to add “interestingness” we have to add uncertainty and that’s whole new world of pain (Allais paradox being the least)… but risk should have its rewards, shouldn’t it?

Kari Silvennoinen is a Ph.D student at Helsinki School of Economics and is currently working on behavioral decision making topics.

Like
Unlike

Staypressed theme by Themocracy