Recommender Systems

(Summarized Notes from Recommender Systems (intro chap) + Survey Papers)

The basic idea in Recommender systems is to use different sources of feedback data to infer customer interests. This feedback can be direct – like user ratings (likes/ dislikes/ 1-5 etc) or implicit – based on click or purchasing behavior. In the most common formulation, the entities of interest are:

users – Entity to which recommendation is provided
item – Entity or product being recommended

The recommendations are based on previous interaction between users and items. An exception is knowledge-based recommender systems where the recommendation is based on user-specified requirements than history.

The basic principle of recommendation algorithms is that significant dependencies exist between user- and item-centric activity. Various categories of items are correlated or individual items may have links to each other. Similarly interests and actions of user can be used to create cohorts of similar users. These dependencies can be learned in a data-driven manner from the ratings matrix and used to make recommendations to the target user.

Types of Recommender Systems

There are three basic types of Recommender Systems Models –

Collaborative Filtering (Neighborhood Models): Use ratings from multiple users to predict missing ratings
Content-Based Models: User interests are modeled on the attributes of the items they have rated or accessed in the past.
Knowledge-based systems: Users specified interests are combined with domain knowledge to provide recommendation.

Hybrids of the basic types are also found in practice.

Problem Statement

Prediction Version (matrix completion problem)

Predict the rating value for a user-item combination.
Training data is available as an incomplete m x n ratings matrix (m users X n items).
Specified (observed) values are used for training.
Goal to predict the missing (unobserved) values.

Ranking Version (top-k problem)

Determine top-k items for a user or top-k users for an item. Top-K items is more common.
Absolute values of the predicted ratings not important.

Note that the Prediction version is more general as Ranking of the predictions can be done to get the top-k results. However, it may be easier to do ranking in many cases.

Operational and Technical Goals

Relevance: Primary goal – user more likely to consume items they find interesting. Not sufficient.
Novelty: Recommended item something that user has not seen in the past – eg don’t recommend most popular movie in genre etc.
Serendipity: Related but different that novelty. Recommend something unexpected or surprising to the user (as opposed to something they did not know about before). Beneficial side-effect of increasing diversity but be careful of irrelevant recommendations
Increasing Recommendation diversity: Top-k lists should have items of different types so that user does not get bored.

Providing the user an explanation of why a particular item is recommended is often useful.

Historical Context

GroupLens: Pioneering recommender system for recommendation of Usenet news. Also notable for releasing several benchmarking datasets (GroupLens, BookLens, MovieLens).

Amazon.com: Recommendations provided on basis of explicitly provided user ratings(1-5 stars), buying behavior, and browsing behavior. Explanations for recommendations are provided.

Netflix Movie Recommender: 5-point user rating scale + user actions in terms of watching various items used to make recommendations. Explanations for recommendations provided ( Provide user with understanding of why they might find the movie interesting). Netflix Prize Contest had significant impact.

Google News : Recommend news to users based on history of clicks on different news items. Such ratings are unary (can only express like but not dislike) and implicit. Explicit variations of this also possible.

Facebook Friend Recommendation: This uses different algorithms than ratings based predictions as it is based on structural relationships rather than ratings data. Referred to as link prediction in social network analysis.

Collaborative Filtering Models

Collaborative filtering models use the collaborative power of ratings provided by multiple users to make recommendations. Main challenge is underlying ratings metrics are sparse. The basic idea:

Unspecified ratings can be imputed because observed ratings are highly correlated across various users and items.
Models leverage either item-item correlations or inter-user correlations for the prediction process (or both).

Two types of methods commonly used in Collaborative Filtering (CF):

Memory-based methods (neighborhood-based CF algorithms): Earliest CF algorithms. Ratings of user-item combinations predicted on the basis of their neighborhoods. The neighborhood can be defined in two ways:
- User-based CF: Ratings provided by like-minded users of a target user A are used to make recommendations for A. Similarity functions are computed between the rows of the ratings matrix to discover similar users.
- Item-based CF: Predict rating of target item B by user A
  1. Determine set S of items that are most similar to target item B.
  2. Ratings in item set S, that are specified by A, are used to predict if user A will like B. Similarity functions are computed between the columns of the ratings matrix to discover similar items.

Memory-based techniques are simple to implement and recommendations are often easy to explain but do not work very well with sparse matrices. If only top-k items are required, then lack of coverage (not enough ratings to make predictions) is often not an issue.

Model-based Methods: ML and data mining methods are used to build predictive models. Use training data to learn parameters of the model. Hybrid techniques that combine memory and model based methods are becoming popular.

Types of Ratings

Rating Type	Description
Interval-based	Discrete set of ordered numbers 1:5, -2:2 etc
Ordinal	Ordered Categorical values {strongly agree ->… -> strongly disagree }
Binary	Like / Dislike ; 0 or 1
Unary	Only able to specify Like ; Common in implicit feedback sets

Ordered Ratings (1-5) are more expressive than unary ratings (presence-absence).

Relationship with other Modeling Problems

Collaborative filtering models are closely related to missing value analysis that studies the problem of imputation of entries in an incompletely specified data matrix. CF is a special case of this for a very large and sparse data matrix. Similar techniques from missing value analysis can be used for CF.

CF methods can be viewed as generalizations of classification and regression modeling. In Classification problems, the class/ dependent variable can be viewed as an attribute with missing values. Other features are independent variables. CF generalizes this because any column is allowed to have missing values (rather than only the class variable). Any row might have missing entries – no distinction between training and test rows. In CF models, therefore prediction is performed in entry-wise fashion rather than row-wise fashion.

Content-Based Recommender Systems

In content-based recommender systems, the descriptive attributes (“content”) of items are used to make recommendations. Eg, Descriptive attributes for movies could be genre, rating etc. For each user, using the items rated by that user as training data, we create a specific classification or regression model with the item attributes as features and the rating as the class or dependent variable. Use this model to predict whether the user will like an item (for which the rating is unknown).

Pro: Useful if sufficient rating data not available for item as other items with similar attributes might have been rated by that user.

Con:

May provide “obvious” recommendations because of the use of keywords or context. As constructed model is specific to the user and community knowledge from similar users is not used. Reduces diversity of recommended items.
Effective at providing recommendation for new items, but not effective at providing recommendations for new users. Usually need a large number of ratings from user to make robust predictions without overfitting.

Knowledge-Based Recommender Systems

Particularly useful in context of items not purchased often (Real Estate, automobiles etc). Also in the context of the cold-start problem (not enough recommendations available)
If item domain is complex (lots of choices), sufficient ratings may not be available for the large number of combinations.
Ratings not used for recommendations
Use knowledge bases (rules and similarity functions to use) along with customer requirements and item descriptions to make recommendations. Allow the user to specify what they want.
Two types of Knowledge-based systems classified on the basis of interface type:
- Constraint-based – user specifies requirements or constraints (color, make etc) and the system uses domain-specific rules to match user requirements to item attributes.
- Case-based – Use specific cases as targets or anchor points. Similarity metrics are defined on item attributes to retrieve similar items to these cases. Process can be iterative.

Utility based : Define utility function to compute probability of a user liking them item.

Demographic :

Use demographic information about the user to learn classifiers that can map specific demographics to ratings or buying propensities.
Can combine with additional context to create context-sensitive recommendations.
Usually best used in combination with other methods (not as standalone)

Hybrid or Ensemble-based Recommender Systems

The different techniques work well in different scenarios and have different strengths and weaknesses. If a wide variety of inputs are available, we can use different types of recommender systems for the same task – Hybrid systems. Ensemble-based recommender systems can combine multiple data sources or even multiple models of the same type to improve effectiveness.

References

Recommender Systems – Charu C Aggarwal
Toward the next generation of recommender systems – G. Adomavicius and A. Tuzhilin
Recommender Systems survey – J Bobadilla et al 2013