Share on FacebookTweet about this on TwitterPin on PinterestShare on Google+

Open data is the new flavor. Governments world-over are realizing the potential of open data in governance and city planning. However, putting open data to work in real-world enterprise can be tricky.

Japan is experimenting with possibilities to use open data to reduce food wastage. Trimming food waste is not only important to environment and the economy but also to retail companies which can increase their margins by big portions. Japan imports around 60 percent of its food supply and 17-23 million tons a year is wasted! The Ministry of Economy, Trade and Industry (METI) is aiming at cutting this waste by 30-40% by using open weather data and enabling manufacturers and distributors to share demand projections. In this regard, we looked at one perishable food item Tofu (a major diet component in East Asia): its manufacturing and its correlation with weather data and other sensor data.

The data sets were available through this contest.

Screen Shot 2016-04-04 at 12.09.31

Using open weather data (provided by Japan Weather Association), climate sensor data (provided by NTT Docomo Japan) and Tofu sale data (shared by different distributors in the Kanto region) we modelled the amount of Tofu that is manufactured everyday.

The Data

We have made the data available on this Github repository along with some R code for analysis.

  • The Data Consists of amount of Tofu manufactured in the year 2014 for 5 types of tofu – A, B, C, D, E.

 

  • NTT Docomo provided sensor data of temperature, humidity and precipitation during every hour in 2014 for 33 locations in Kanto region of Japan.

  • Weather data contains measurements like mean, min and max of temperature, wind velocity, precipitation, snow etc for 53 locations in Kanto region.

  • Lastly many distributors collaborated to combine the data related to tokubai-sale of Tofu per day in 2014.


 

The Model

Given so much open data, there is a lot of analysis that can be done. We tried to predict daily Tofu manufactured using the given data. We made daily features like total Tofu sales in previous 7, 14, 35, …  days; mean/sd of temperature from 08:00 to 23:00 everyday for different regions and similar features for different time gaps and different sensor measurements (precipitation, humidity).

After cleaning and extraction, all these features were modelled using an Xgboost model, Random Forest and a linear glmnet model in R. We show the prediction results for Tofu A.

 

The score was calculated using RMSE calculation on 5 fold Cross Validation on 2014 data.

Model Target Variable RMSE Score
Xgboost A 804.59
Random Forest A 642.91
Glmnet A 545.23

 

A visualization of the predictions is below. Red line represents the true value of Tofu A production while the green line represents the predictions by different models. The prediction accuracies are given in the above table.

xgb

Rplot01

Rplot02

 

Notice that RF and Xgboost model predict the general trend of the time series quite well while the glmnet model predicts the peaks quite well. Thus ensembling or even stacking these models can lead to a more accurate prediction but that is for a more deeper discussion. The exciting observation is the accuracy with which Tofu production can be predicted using weather data. What if we can generalise to other perishable food markets? Because weather predictions are easily available, retailers and manufacturers would know in advance the future Tofu needs in the market and can produce accordingly.

 

Our final model was a glmnet ensemble of a couple of such models which produced a cross validation score of 487.38 and a final score of 550.12 (for year 2015) in the contest (which ranks at 9). The best prediction was at 385.44.

A plot of final predictions for Tofu manufactured:

final

Leave a Reply

メールアドレスが公開されることはありません。 * が付いている欄は必須項目です

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">