Starting today, we kick off the Uhuru Data Lab (UDL) blog with this first post. After some exciting data crunching, we present our approach to the recently concluded competition Coupon Purchase Prediction by leading coupon site, Recruit ポンパレ (Ponpare).
The competition asked teams to use 1 year of past purchase and browsing data of 22,000 users of Ponpare to predict which coupons a user will purchase over the next week. We present a simple gradient boosted decision tree model which gives a score of 0.0072 on the Leaderboard.
Overview, cleaning the data and evaluation metric:
In any real dataset cleaning is the first step. Understand the files through Entity Relationship Diagrams, make basic statistics and study basic characteristics.
Understanding the data :
In this case we had coupon purchase and view logs of 22873 users from 01-07-2011 to 23-06-2012. In total they viewed 32628 coupons on their website in this period. But of these there are only 19413 coupons whose attributes like price rate, discount price, usable dates, shop location etc have been provided. We consider these as the training set coupons. Prediction is to be made over 310 new coupons over the next week ( 24-06-2012 to 30-06-2012 ) whose attributes have also been provided. The challenge is to predict: which of these 310 coupons will be purchased by the 22873 users? Past browsing data of users contains attributes like view date, user id, purchase flag, session id, viewed coupon id etc. There is also data about user age, gender, prefecture and registration/withdrawal date.
It seems it is possible to purchase coupons without any views. We assumed that each of those purchases has 1 view at exactly the same date and time the purchase occurred.
The map@10 metric:
The evaluation metric employed was Mean Average Precision@10. A good explanation of this metric can be found here. W.r.t this metric, for each user only the top 10 coupon predictions matter. Also, the order of predictions matters. We observed from the data that most of the coupon purchasers in a week only view ≤ 10 coupons in that week.
Obviously the number of viewers and views are much more than the number of purchasers and purchases. For obtaining a balanced training data/reducing sparsity (as we explain later) we changed the problem statement and predicted just views instead of purchases. Interestingly, Leaderboard score obtained was sufficient to get into top 40.
The ideas and the XGBoost model:
The problem is essentially that of cold-start item recommendation with user and item attributes both present. User attributes can be extracted from his/her past weblog. Coupon attributes are given.
XGBoost GBDT model:
The XGBoost gradient boosted decision tree model was used.
- Training set: For the previous week, make pairs of viewers (of coupons in that week) and coupons (released in that week).
- The target variable is 1 if a user-coupon pair corresponds to a view else 0.
- To balance the training set, randomly select 13 coupons for each viewer and include all coupons with target 1. This gives an approx. 30:70 ratio of target variable.
- Train using AUC metric and make predictions on all the user-coupon pairs of the next week. For each user consider only 10 coupons with highest predicted probability as the recommended coupons.
- (optional) Make several training datasets in step 3, train and average predictions in step 4 (We made 17 such models).
Just this model gives map@10 of ~0.00072 on the Private Leaderboard. Cross validation was performed by training on previous week’s viewer-coupon pairs and predicting on next week’s user-coupon pairs.
Other ideas (BPRMFkNN model):
Another approach we tried was matrix factorization. Specifically, BPRMF with k-NN mapping as explained in this paper (Learning Attribute-to-Feature Mappings for Cold-Start Recommendations). The idea is to make a user-coupon matrix for all the users and all the coupons (whose attributes are given) released before the current week.. The matrix contains 1 if a purchase occurred, else 0. The matrix sparsity is around 99.96%.
We used the open-source recommender system library MyMediaLite. It already has BPRMF implemented for item prediction.
- Factorizing with 160 factors seemed to perform well.
- Use k-NN mapping (k=8 seems to perform best) to map new coupons from their attribute-space to 160-factor-space.
- We use only binary 0-1 attributes for coupons and a cosine distance as the similarity function in k-NN mapping.
This models gave a slightly lower performance than the above XGBoost model (~0.00068 on Private Leaderboard). Ensembling didn’t seem to help significantly.
The main focus was on the features used in the XGBoost model. We categorized features into 3 categories: coupon-features, user-features and user-coupon-pair-features. The user-features were further categorized into global-user-features (independent of user weblog) and week-dependent-user-features (derived from user weblog prior to the given week).
These are features which measure affinity between the user-coupon pair with respect to various attributes like capsule text, genre name, shop large area name, catalog price, usable date etc. Affinity is calculated using user’s past weblog. For each user, a score between 0 and 1 is calculated w.r.t to each of the above one-hot encoded coupon attributes. Affinity measures also make use of page_serial, session_id and purchase_flag given in the data. These are the most important features in the model.
Code, parameters and (not tried) approaches:
Everything was carried out on Amazon EC2 r3.8xlarge instance using RStudio server. Affinity features generation especially take time and of course model training. This repository contains the complete reproducible code in R.
Other approaches can be experimented with. For example, to train XGBoost model with purchases instead of views (or even try other classification models like random forests, kNN etc). Or use other matrix factorization methods like BPRLinear , WRMF from MyMediaLite package.