This report provides an introduction to the emerging field of Statistical Performance Monitor-ing for photovoltaic (PV) systems and a survey of the development of these fault detection systems and their applications.
This survey found four primary methods used for identifying faults: (i) identifying faulty elec-trical signatures, (ii) comparing historical performance to actual performance, (iii) comparing predicted performance to actual performance and (iv) comparing the relationships between different PV systems or subsystems. The four approaches used for identifying faults include applying machine learning algorithms, statistical tests, specifying computational rules and generating simulations using models.
As shown in Figure 1, from the research papers studied, it shows that Asia is leading the world in studying and developing PV fault detection systems followed by Europe. The popularity of different parameters used by fault detection systems by developers include current and/or voltage (AC or DC) (25%), irradiance (19%), temperature (17%) and IV curve data (12%).
The study also found clear machine learning algorithm preferences. Among the papers stud-ied artificial neural networks are the most popular (30%), followed by K Nearest Neighbors (10%), fuzzy systems (8%) and support vector machines and linear regression (7%).
In addition to explaining the statistical algorithms in effect and studying the approaches used for identifying faults, this paper also reviewed the different sources of data used by PV fault detection systems. Research has found that PV fault detection input data comes from a va-riety of devices and sources including sensors connected at the site, commercial weather stations, inverters, optimizers and IV curve tracers. Depending on the device system architec-ture, different parameters are available at different frequencies and accuracies.
It appears from this study that a machine learning training strategy using training data close in time to testing data provides better results and that performance data and environmental data seem to be on par with each other for some machine learning algorithms regarding accuracy of the outcome.
In comparing 8 of the 22 of the summarized algorithms in a head-to-head competition where each was fed the same data from a live PV system it was found that different algorithms have very different sensitivities.