Learning to build Machine Learning

On this weekend I spent hours of my time learning prediction.ioPredictionIO is an open-source Machine Learning server for developers and data scientists to build and deploy predictive applications in a fraction of the time.

Prediction.io make the machine learning very easy to learn. It’s architecture make things easy to train the machine and deploy it. It is based on the templates with DASE architecture. And since it is build on top of Apache Spark with Mllib, all the mllib algorithms are available with prediction.io.

I’m not going to tell you about prediction.io about, you can read the complete documentation on their site. Instead I want to write about my research so far with prediction.io.

I’m building a recommendation  engine to learn a complex sets of data. I want to achieve a machine that can learn that data and recommend the most similar and what are they looking for.

Imagine an eCommerce platform, after a while you view and purchase the products, the machine could offer you something like:

  1.  Offer you a similar product while you browsing a product
  2. Email you the promotion based on your purchase history
  3. Offer you a one time offer promotion based on sets of products you interest.
  4. Predictive search based on the current trend

The application of the recommendation engine, prediction engine and similar product engine are a lot.

What I learn so far from prediction.io are:

  1. The prediction result is not exactly how I wanted, but it is closed. And the more data you train, doesn’t mean it will more accurate.
  2. The training part is still unknown to me, I need more time to learn of how it is working
  3. The machine cannot be trained in real time, which means it cannot received an event an recalculate the prediction. You need to shutdown it, train and redeploy. There is a script for retrain and redeploy without downtime, but it basically the machine need to retrain frequently based on the event collected.
  4. Prediction.io is using apache spark, and could use cluster of compute cloud for big data
  5. I got is working for recommendation and similar product prediction based on the real data I have. The prediction result is quite fast.
  6. It took some times to do pio build and pio train. For large data pio train took 5 minutes to finish.
  7. Prediction.io available in vagrant version to learn.

I wrote this for my records in the future, but if it help you and you had something in mind, just let me know.

 

Give me your feedback

This site uses Akismet to reduce spam. Learn how your comment data is processed.