The training data were used to perform cross-validation and grid search for model tuning. Once the optimal / best parameters were found, the final model was matched for the entire training dataset using the findings. It was then determined how the final model behaved on the test dataset.
The fitting models used in this study were set manually. Manual tuning involves selecting some desired value for setting the model parameters, and a fine-grained control over the tuning parameters can be obtained. Manual tuning was performed via the tuneGrid argument, as follows:
mtry (number of randomly selected predictors) and ntree (number of branches that will increase after each split) parameters for the RF algorithm. In this study, a value range of 1–3 for the mtry parameter, according to RGB pixels was used, and the ntree used was 1–1000. The final values used for the best model were three for mtry and 200 for ntree.
For cost and loss function (sigma) parameters (to control the nonlinearity of the hyperplane and the effect of each supporting vector) for the SVM algorithm, values in the range of 1–10 and 0.1–0.9 were used for the cost and sigma parameters , respectively, and the final values used to obtain the best accuracy were sigma = 0.05 and C = 5.
For the size and decay parameters (number of neurons in the hidden layer and regularization parameters to avoid overfitting conditions) for the ANN algorithm, the size value used was 1–25 and the decay value used was 0.1–0.9. The final values used to obtain the best model were: size = 9 and decay = 0.4. Figure 5 shows the results of manual tuning using the parameters defined during training for the RF, SVM, and ANN algorithms.
For the XGBoost algorithm, model tuning was performed as follows: The eta parameter was set to 0.1, max_depth was assigned a value of 6, subsample was assigned a value of 0.7, and colsample_bytree was assigned a value of 1.
To prevent overfitting, it was necessary to determine the step size reduction used in the weight updates. After each boosting step, new feature weights were immediately obtained, and eta reduced the feature weights to make the updating process more conservative. The step-size reduction values ranged from 0 to 1. A low eta value makes the prediction model easily trapped in overfitting conditions. The max_depth parameter was the maximum depth value of the tree, and it can vary from 1 to infinity. The subsample parameter was the ratio of the training dataset. A value of 0.5 means XGBoost will randomly collect half of the training dataset to create a tree, which will prevent overfitting conditions. The subsample parameter values ranged from 0 to 1. The colsample_bytree parameter was the subsample ratio of the columns when constructing each tree, and the values ranged from 0 to 1. Figure 6 illustrates one of the decision trees generated by XGBoost, and Fig. 7 presents a bar graph of the resulting model using predefined parameters.
Pansharpening (optical and thermal data merge) and pixel values extraction
The process of pansharpening or combining optical and thermal data was crucial for this research. The steps must provide a false coordinate system for both the optical and thermal data.
The false coordinate assignment process used QGIS software with the Georeferencer feature. A georeferenced map-to-image approach was taken using Google Maps. Coordinates were determined by providing four main points in the upper right, upper left, lower right, and lower left corners of the optical and thermal images. In the transformation parameter, a linear transformation was used with the resampling nearest neighbor method.
Using these results, the two images data could then be combined using the pansharpening method utilizing the cubic resampling algorithm. The coordinates played an important role in the data merge process. According to these results, the optical and thermal data given by the false coordinates did not overlap with each other precisely. There was a gap of several millimeters, which resulted in an imperfect data merge result. This had an impact on the classification process and results.
The classification process took point data with pixel values from each channel / band. Point data were obtained by randomly generating 10 points for each class, 50 points in total. The point-sampling tool from QGIS was used for the extraction process. These data were then used as training and test data after being divided using the modified Pareto principle.
Because of imperfect pansharpening, the extracted pixel values were not completely accurate, and consequently, the classification results were not completely accurate. This also occurred because the optical and thermal sensors were from different devices.
The performance of each algorithm was compared in terms of accuracy and Cohen’s Kappa. The classification results of XGBoost, Neural Network, Random Forest, and SVM are shown in Fig. 8. The ANN and RF algorithms yielded low results, with an accuracy of 40% and Cohen’s Kappa of 0.25 (low). The SVM algorithm obtained an accuracy of 53% and Cohen’s Kappa of 0.42 (moderate). The XGBoost algorithm obtained an accuracy of 60% and Cohen’s Kappa of 0.5 (moderate). Figure 9 presents the accuracy and Cohen’s Kappa for each algorithm using the optical data input in graphics.
The classification results using the FLIR thermal data are shown in Fig. 10. XGBoost, ANN, and SVM yielded the same results, with an accuracy of 53% and Cohen’s Kappa of 0.42 (moderate). The RF algorithm obtained lower results, with an accuracy of 47% and Cohen’s Kappa of 0.33 (moderate). Figure 11 presents the accuracy and Cohen’s Kappa for each algorithm using the optical data.
Figure 12 presents the classification results obtained using the combined optical and thermal FLIR data, while Fig. 13 presents the accuracy and Cohen’s Kappa value for each algorithm using FLIR’s combined optical and thermal input data in a graphical format The RF algorithm obtained an accuracy of 51 % and Cohen’s Kappa of 0.39 (moderate). The XGBoost, ANN, and SVM algorithms obtained accuracies of 73%, 66%, and 57%, respectively. The Cohen’s Kappa value obtained by XGBoost was 0.63 (high), by ANN was 0.58 (moderate), and by SVM was 0.46 (moderate). The achieved accuracy is sufficient for practical uses25.
Time performance comparison
The time performance for each process was performed among algorithms using the profvis package in RStudio. Figure 14 shows the average duration required for each algorithm to perform the training and classification processes.
The ANN algorithm had the most extended training duration, followed by SVM and RF. XGBoost had the most rapid training duration. The SVM and RF algorithms required a longer duration for the classification process than the ANN and XGBoost algorithms. XGBoost had the highest time performance in all stages, and the ANN algorithm was the least efficient.
Plastic waste above and below the water surface
Plastic waste sources in the riparian zone of rivers vary, for example, single-use plastic packaging, shampoo, or soap wrappers. This plastic waste will not be decomposed and will continue to be in the riparian zone or be carried by water currents downstream.
Plastic waste recorded by optical and thermal sensors is above and below the water surface, providing challenges for accurate classification results. Plastic waste below the water surface has a lower thermal value than that above the water surface. Often, the thermal value of plastic above the water surface is similar to that of other objects, such as vegetation and trunks / branches. The thermal value of plastic waste below the water surface is the same as the thermal value of the water (Fig. 15).
The optical sensor provides a better pixel value in distinguishing plastic waste from other objects, regardless of whether the garbage is above, or below, the water surface. The spectral value of plastic has the highest range of values, and vegetation has the lowest range of values compared to other objects (Fig. 16).
The combination of optical and thermal data yielded moderate results. The increase in accuracy and Cohen’s Kappa was not significant, despite the combination of optical and thermal data that was used in the classification process. The classification using the thermal data alone had 40–50% accuracy, and the optical data alone had 50–60% accuracy. The accuracy increased to 60–70% when using the merged data. Figure 17 compares the plastic waste classification results from the four classification algorithms using a combination of optical and thermal data. The plastic waste is above and below the surface. Plastic waste below the water surface is visible only in shallow and clear water, but not in deep and murky water. The classification results are unsatisfactory.