Creative mash-up of several existing algorithms to produce the best optimal predictive model
“It is very hard to fail completely if you aim high enough,” remarked Larry Page who with Sergey Brin is one of the co-founders of Google.
The same goes for two junior data analysts, Denis Goh Ee Kin and Soh Zong Xian, both third-year Computer Science students from Asia Pacific University of Technology & Innovation (APU).
Last month, Sepuluh Nopember Institute of Technology, Indonesia hosted the virtual Pekan Raya Statistika Data Analysis Competition 2021, which drew more than 4,000 participants from Southeast Asia.
This is a data analysis competition for university students in Southeast Asia, with the goal of nurturing competitiveness and critical thinking in data analysis.
With their Algorithm to predict numeric values and categorical variables using XGBoost Regressor and Random Forest Classifier, Denis and Zong Xian (team Exponentials) received the Best Algorithm Award in the final round.
In the preliminary round, the jury was impressed by their creative mash-up of several existing algorithms to produce the best optimal predictive model, which included exploring and cleaning the raw dataset (Train and Test data) using Ensemble methods — a machine learning technique that combines several base models to produce one optimal predictive model.
They utilised these algorithms to classify the Submission dataset by focusing on feature significance to determine which variable is the most relevant to the targeted variable.
Student-centered learning
Mr Mafas Raheem their supervisor, was particularly helpful in developing and evaluating their algorithm, as well as advising them on Machine Learning methods.
He also provided them with some useful keywords to improve their research process, as well as machine learning best practises and some feedback on their algorithms.
“Because the preliminary round is just one week long, we finished our model implementation and documented it using the CRISP-DM Methodology.
“Our ensemble techniques combine multiple models to produce an optimal prediction model,” said Denis, who added that following graduation he intends to work as a Data Analyst.
Given the opportunity, he would like to gain expertise by working for a firm to build a new algorithm or upgrade existing algorithms in the future.
Zong Xian claims that, one of the algorithms developed was influenced by APU's Analytics Club's intelligent debates during their committee meetings.
He is impressed by data specialists in general and aims to be a data engineer in the future and be involved with a data science/analytics end-to-end project.