# News Estimate pH easily and with high accuracy using common pH test strips powered by machine learning technology and supported mobile devices

Figure 2 shows a collection of 130 captures of experimental color changes of the pH paper (at 350 Lux) over the range (0–14) at intervals of ~0.1 pH. It is worth mentioning that traditional estimates based on pH paper color changes are accompanied by significant changes (~2) in pH. This high variation in pH resulted in a noteworthy misestimation of eye detection. This discovery encouraged us to develop a new simple and more accurate method of pH measurement. Therefore, the experiment was extended to cover most of the three different lighting workplaces where the user could work at 350, 200 and 20 Lux. In addition, color RGB codes collected at seven different locations for each capture emphasize the uniformity of pH test paper color. In total, the dataset includes 2689 experimental RGB values from different lighting workplaces.

To better understand the results observed in different workplaces, Figure 3 shows the exploratory data analysis (EDA) of the color code RGB versus pH at different light intensities of 20, 200 and 350 Lux.

Over a wide pH range, the color code points are divided into three sections. At three different light intensities at the investigated workplace (20, 200 and 350 Lux), there were significant variations in the color codes of red and green and even blue over the (2.5:9) pH range. It is worth mentioning that the blue codes at low light intensity of 20 Lux (a somewhat dark workplace) deviated from the codes obtained at higher or medium light intensities, suggesting to avoid low light conditions in the future test. In contrast, the results showed no significant difference in how red or green behaved under light intensities. The results showed that increased alkalinity (>9) or acidity (<2.5) could explain the color and possibly produce less accurate predictions at that part of the pH range. Therefore, this finding may encourage the scientific community to prepare more sensitive materials to work in strong acid and/or strong base media.

In addition, it is critical to identify and assess the degree to which each parameter depends on other parameters. This knowledge can help define the expectations these interdependencies provide, leading to the creation of more efficient pH devices and color-sensitive materials. Because of this, using a machine learning strategy, the statistical Pearson correlation coefficient (r_{Axis}) between pH parameters were studied based on the following equation. (5) and (6):

$${\mathrm{cov}}_{x,y}=\frac{\sum \left({x}_{i}-{\overline{x}}\right)\left({y}_{ i}-{\overline{y}}\right)}{N-1}$$

(5)

$${r}_{xy}=\frac{{\sum }_{i=1}^{N} \left({x}_{i}-{\overline{x}}\right)\left ({y}_{i}-{\overline{y}}\right)}{\sqrt{{\sum }_{i=1}^{N} {\left({x}_{i}- {\overline{x}}\right)}^{2}}\sqrt{{\sum }_{i=1}^{N} {\left({y}_{i}-{\overline{y }}\right)}^{2}}}$$

(6)

Where *no* record the number of samples, \({x}_{i}\), \({righteous}\) are the individual elements of the RGB and pH predictions, respectively, and \(\overline{y}\) Average pH value.

The correlation between the pH parameters is represented by the heat map in Fig. 4. The obtained results reflect an excellent negative correlation between pH and red color (-0.77). Likewise, pH has an acceptable correlation with green (-0.38). Blue has a very low correlation with the pH (0.044) observed in red or green. This refers to the fact that blue has less impact on machine learning predictions than red and green. Likewise, workplace lighting had no significant effect on pH -0.03. Therefore, color pH test strips can be safely captured regardless of light intensity.

### Machine Learning Model Prediction

Using experimental data, a preliminary analysis of machine learning regression techniques is performed using optimal hyperparameters on K-Nearest Neighbors (KNN), Linear, Lasso, Elastic Net, AdaBoost, Neural Networks, Random Forest, and Support Vector Machines (SVM) Performed machine learning regression technical analysis and gradient boosting regression algorithm^{28,29,30} Estimated coefficient of determination (R^{2}) and the minimum errors of the corresponding regression evaluation indicators, including Root Mean Square Error (RMSE), Mean Absolute Error (MAE) and Mean Square Error (MSE), as shown in Figure 5 and recorded in Table 1.

It is clear that the KNN model with the optimal hyperparameters of five points performs remarkable results in R^{2} (0.993) Combined MSE, RMSE, and MAE have the lowest errors (0.012, 0.320, and 0.182, respectively) compared to other models. In addition, the root mean square error (CVRMSE) coefficient of variation of the KNN model showed a higher stability performance of 4.077 compared to the other models. In addition, (3, 5, 10 and 20) K-fold cross-validation was tested to confirm the stability of the model. However, no significant differences were found between the results, which validates the KNN model.

To gain a better understanding, further studies showed that the relationship between the model’s predictions (based on the test data) and the experimentally obtained pH values is shown in the scatter plot in Figure 6. Linear regression, elastic network and neural network algorithms cannot recognize the entire experimental point, especially in the strong acid/base pH range. However, precise estimation along the square diameter line will be done using KNN, Gradient Boosting, Random Forest and AdaBoost algorithms, which can be selected for further steps in deploying the code. Despite the higher performance and minimal bias of these algorithms, KNN was chosen to deploy machine learning mobile applications because of its lowest error (RMSE; 0.32) and higher stability (CVRMSE; 4.08).

It is now clear that the KNN model can successfully reveal the underlying patterns of color RGB codes in pH estimation based on experimental data collection. Therefore, the machine learning method based on this model was further extended and used to develop a multifunctional platform capable of predicting pH with high accuracy using common pH test paper. The online mobile application of the predictive model was developed using python code and streamlit cloud (freely available) and allows highly predictive determination of pH as a function of the RGB color code of plain pH paper.

As shown in Figure 7, the mobile application consists of three steps; starting with an input file that can be inserted into the pH strip capture (immediately after immersion in the target solution). For even more convenience, we’ve coded three options (upload a picture, use a mobile camera, or insert an RGB color code). This step is followed by a built-in Machin learning process (not controlled by the user). Finally, the output of the pH value will appear on the screen.

Our study has significant advantages over those already used, Figure 8 shows a fair comparison of pH instruments, pH strips and the current study.

Furthermore, Fig. 9 shows the comparison of the estimated pH value (output result) of the proposed mobile application with the actual pH value. Interestingly, across the entire pH range (acid or base), this correlation between actual and estimated values was associated with a higher accuracy of the ML model used.

However, Solmaz et al.^{31} Colorimetric detection of pH paper was studied using ML, as shown in Table 2.

However, four different types of smartphones were used to check the accuracy of the pH value prediction for three buffer solutions (pH = 3, 7 and 10). Default settings are used to avoid any smartphone influence. As shown in Figure 10 and Table 3, there is indeed no significant difference in the estimates of pH between the various smartphones, with each type being more than 90% accurate.

Additionally, Table 4 shows recommended conditions and limitations for using the application to achieve more accurate predictions.

Overall, the present findings solve the problem of pH accuracy using common pH paper without requiring additional costly and time-consuming experimental work. However, our method addresses the high cost and maintenance issues required by conventional pH meters.