Recommender Systems

(Summarized Notes from Recommender Systems (intro chap) + Survey Papers) The basic idea in Recommender systems is to use different sources of feedback data to infer customer interests. This feedback can be direct – like user ratings (likes/ dislikes/ 1-5 etc) or implicit – based on click or purchasing behavior. In the most common formulation,Continue reading “Recommender Systems”

The Basic Outlier Detection Models

(Notes from Outlier Analysis – by Charu Aggarwal) Factors influencing choice of outlier model: Data Type Data Size Availability of Labeled outliers Need for interpretability (Very desirable) Interpretability of Model Results A model that can describe why a particular data point is considered an outlier could provide the analyst further hints about the diagnosis andContinue reading “The Basic Outlier Detection Models”

Outlier Analysis: The Data model is everything

(Notes from Outlier Analysis Chap1: by Charu C Aggarwal) All outlier detection algorithms generally follow this approach: Create a model of normal patterns in the data For given data point, compute outlier score based on deviations from this pattern. This is done by evaluating the quality of the fit between the data point and theContinue reading “Outlier Analysis: The Data model is everything”

Intro to Outlier Analysis

(Notes from Outlier Analysis – Charu C Aggarwal – Chap 1.1) An outlier is an observation which deviates so much from the other observations as to arouse suspicions that it was generated by a different mechanism Data is created by generating processes (system activity or monitoring activity) – unusual behavior creates outliers. An outlier oftenContinue reading “Intro to Outlier Analysis”

Column-oriented Storage and Column Families

Traditional databases store data in a row-oriented fashion, i.e, all the values from one row of a table are stored contiguously. Column-oriented Storage store all the values from each column together. The advantages of columnar storage are: Queries reading from a single column need to fetch data only from that file Better compression: Storing similarContinue reading “Column-oriented Storage and Column Families”