PV Nowcasting

Project summary

Using Deep Machine learning techniques, this project is exploring whether if we had more accurate predictions for solar electricity generation then we could reduce the amount of "spinning reserve" required.

Name Status Project reference number Start date Proposed End date
Solar PV Nowcasting
Live NIA2_NGESO002 Sep 2021 Jul 2023
 

 

Strategy theme Funding mechanism Technology Expenditure Third Party Collaborators
Net zero and the energy system transition NIA_RIIO-2 Comms and IT, Measurement, Modelling, Photovoltaics £500,000 Open Climate Fix (OCF)
Summary

Using Deep Machine learning techniques, this project is exploring whether if we had more accurate predictions for solar electricity generation then we could reduce the amount of "spinning reserve" required. This would reduce carbon emissions and reduce costs to end-users, as well as increase the amount of solar generation the grid can handle

Benefits

Not required.

ENA smarter networks portal

Learnings

Outcomes

The outcomes of the project to date are (2024 progress):   

  • Fully operational PV Nowcasting service running on two ML models:  
    ​​ ​PVNet for 0-6 hours​​  
    ​​​Blend of PVNet and National_xg for 6-8 hours, and ​​  
    ​​​National_xq from 8 hours and beyond. ​​     ​​​     ​​​     ​​  
     
  • Accuracy improvement ​over the previous OCF model by approximately 30% for the GSP and National forecasts (4-8 hours)​, resulting in forecasts approximately 40% more accurate than ​the BMRS model and over 40% against the PEF forecast (for 0-8 hours).​​  
        ​  
  • Probabilistic forecasts for all horizons  
    Backtest runs for DRS project   
    UI including a new Delta view, dashboard view and probabilistic display  
    UI speedup, with query times from 20s down to <1 second  
      
    The most significant out of these is the achievement of the target set by NG-ESO of a 20% reduction in MAE error.  This is extremely large in renewable forecasting, and is the result of numerous machine learning improvements.    

  
Lastly, the forecast from Open Climate Fix is delivered completely open and documented. The resilience was significantly increased over the project duration, resulting in over 99.5% availability. This resilience is implementable by NG-ESO, with all the infrastructure constructed in code to allow replicability.

Lessons Learnt

 
In the course of W1 and WP2, the project identified the following lessons:   
   
Never underestimate the importance of cleaning up and checking data in advance   
Several approaches to loading data were tried, from on-the-fly to pre-preparing, and instituted automatic and visual tests of the data to ensure the project was always lining up the various data sources correctly.   
   
Having infrastructure as code allows the main production service to run uninterrupted   
Having code to easily instantiate infrastructure is very useful to the efficient management of environments to ensure the project could bring the algorithm into productive use. The Terraform software tool was used which makes spinning up (and down) environments very easy and repeatable. Being able to spin up new environments allowed the project to test new features in development environments while allowing the main production to keep on running uninterrupted.    
 
Using Microservices to “start simple and iterate” accelerates development   
Using a microservice architecture allowed the project to upgrade individual components as we see benefit in improving them, independently of changing other components’ behaviour. This has been very useful when building out the prototype service, as it has allowed the project team to start with a simple architecture - even a trivial forecast model - and iteratively improve the functionality in the components. For example, first starting out with one PV provider of data has allowed the project to get a prototype working, and in WP3 we will expand to onboard an additional PV provider.   
   
Data processing may take longer than expected   
While it was initially planned to extend our dataset back to 2016 for all data sources during WP2, it turned out that data processing takes much longer than expected. This does not have a direct impact on project deliverables but is something to consider in further ML research.   
Data validation is important   
For both ML training and inference, using clear and simple data validation builds trust in the data. This helps build a reliable production system and keeps software bugs at a minimum.   
   
Engaging specialist UX/UI skills is important   
By acknowledging that UX and UI design is a specialised area and incorporating those skills, a UI has been developed which will be easier to use and convey information effectively. This will be validated over WP3 through working with the end users.    
   
Building our own hardware demonstrates value for money but may pose other challenges for a small team   
Two computers have been built during the project with a total of six GPUs and it is estimated that using on-premises hardware instead of the cloud for data-heavy & GPU heavy machine learning R&D can significantly reduce the direct costs. However, the time it would require for a small team to put together all the components is significant (approx. 25 days for one person in total). While the total costs would still be lower, appropriate resource planning should be considered if planning hardware upgrades in the future.   
 
In the course of WP3 and WP1 (extension), the project identified the following lessons:   
   
Merging the code right away when performing frontend testing is of upmost importance  
Merging the code after frontend testing proved to be time-consuming and it is something important to consider when performing tests.  
 
Large Machine Learning models are harder to productionise   
Large Machine Learning models proved to be difficult to productionise and the size of the model makes it difficult to use. Going forward we need to investigate further how to deploy large models.  
 
Machine Learning training always takes longer than expected  
Even with an already made model, making datapipes to work correctly takes time. It is important to always  have enough time allocated when planning ML training activities.  
 
Security and authentication is hard  
Ensuring the robust authentication/security measures are in place is harder than we envisaged. It may be easier to implement packages already built or contract third party providers to support the process.  
 
Separate National Forecast Model  
PVLive estimate of National Solar generation does not equal the sum of PV Live’s GSP generation estimate. This motivates us to build a separate National forecast, compared to adding up our GSP forecast.   
 
Investment is needed to take open-source contributions to the next level  
Time and resources are needed to engage with open-source contributors and develop an active community. We may want to consider hiring an additional resource to support this activity.  
 
Further lessons from WPS (extension) were as follows: 
 
Expensive cloud machines storage disk left idle   
We use some GCP/AWS machines for R&D and we often pause the machine when they are not in immediate use. This is because some GPU machines can cost significant amounts per hour. It was discovered that costs still accrue for the disk (storage) of the paused machines. Balancing the pausing of the machines with the ability to start them up quickly versus starting a cloud machine from scratch each time has no golden rule, but it is useful to be aware of.  
 
Challenging to maintain active communication with National Grid ESO  
Particularly high turnover at National Grid ESO forecasting team has affected communication on the project. This has evolved over the duration of the project, and more active and easier communication has been observed in the latter phase.  
Reproducibility on cloud vs local servers  
When results differ between the cloud and local services, it can be tricky to determine the cause. Verbose logging, saving intermediate data, and maintaining consistent package versions and setup on both machines helped. One particular bug involved results differing when multiple CPUs were used locally, but only one CPU was used in the cloud.  
  
Protection of production data  
Two environments in the cloud, “development” and “production,” are maintained to protect the “production” data. This setup allowed developers to access the “development” environment, where changes do not affect the live service. Although maintaining two environments increases costs, it is considered worthwhile.  
  
Probabilistic forecasts  
Some unreleased open-source packages were used to implement one of the probabilistic forecasts. The advantage of using this code before its release is noted, but it also means more thorough checks for bugs are required, which can take more time.