A. Masih
Abstract
Modern studies in the field of environment science and engineering show that deterministic models struggle to capture the relationship between the concentration of atmospheric pollutants and their emission sources. The recent advances in statistical modeling based on machine learning approaches have ...
Read More
Modern studies in the field of environment science and engineering show that deterministic models struggle to capture the relationship between the concentration of atmospheric pollutants and their emission sources. The recent advances in statistical modeling based on machine learning approaches have emerged as solution to tackle these issues. It is a fact that, input variable type largely affect the performance of an algorithm, however, it is yet to be known why an algorithm is preferred over the other for a certain task. The work aims at highlighting the underlying principles of machine learning techniques and about their role in enhancing the prediction performance. The study adopts, 38 most relevant studies in the field of environmental science and engineering which have applied machine learning techniques during last 6 years. The review conducted explores several aspects of the studies such as: 1) the role of input predictors to improve the prediction accuracy; 2) geographically where these studies were conducted; 3) the major techniques applied for pollutant concentration estimation or forecasting; and 4) whether these techniques were based on Linear Regression, Neural Network, Support Vector Machine or Ensemble learning algorithms. The results obtained suggest that, machine learning techniques are mainly conducted in continent Europe and America. Furthermore a factorial analysis named multi-component analysis performed show that pollution estimation is generally performed by using ensemble learning and linear regression based approaches, whereas, forecasting tasks tend to implement neural networks and support vector machines based algorithms.
A. Masih
Abstract
In view of pollution prediction modeling, the study adopts homogenous (random forest, bagging, and additive regression) and heterogeneous (voting) ensemble classifiers to predict the atmospheric concentration of Sulphur dioxide. For model validation, results were compared against widely known single ...
Read More
In view of pollution prediction modeling, the study adopts homogenous (random forest, bagging, and additive regression) and heterogeneous (voting) ensemble classifiers to predict the atmospheric concentration of Sulphur dioxide. For model validation, results were compared against widely known single base classifiers such as support vector machine, multilayer perceptron, linear regression and regression tree using M5 algorithm. The prediction of Sulphur dioxide was based on atmospheric pollutants and meteorological parameters. While, the model performance was assessed by using four evaluation measures namely Correlation coefficient, mean absolute error, root mean squared error and relative absolute error. The results obtained suggest that 1) homogenous ensemble classifier random forest performs better than single base statistical and machine learning algorithms; 2) employing single base classifiers within bagging as base classifier improves their prediction accuracy; and 3) heterogeneous ensemble algorithm voting have the capability to match or perform better than homogenous classifiers (random forest and bagging). In general, it demonstrates that the performance of ensemble classifiers random forest, bagging and voting can outperform single base traditional statistical and machine learning algorithms such as linear regression, support vector machine for regression and multilayer perceptron to model the atmospheric concentration of sulphur dioxide.