Predicting day-ahead prices in the energy market using BLSTM

Emanuel Kuhn and Yunusi Fuerkaiti
6 min readApr 11, 2021

Authored by Emanuel Kuhn and Yunusi Fuerkaiti

In the contemporary competition framework ruling the electricity sector, mixed dependencies exist between electrical and market data, which hinders the decision-making procedure of energy actors. The decision-making procedure demands operating within a complex, uncertain environment, and consequently need to rely on accurate multivariate, multi-step ahead probabilistic predictions. Toubeau et al., 2018¹, proposed a deep learning-based prediction methodology and attempted to predict the day-ahead prices.

In this work we aim to reproduce the day-ahead prices forecasted for a specific scenario outlined in the work of Toubeau et al., 2018. The original work was implemented in MATLAB and the source code is not openly accessible. The contribution of the present project is therefore, reproducing a part of the original work by implementing the outlined procedure in PyTorch. It is worthing to highlight that, in the present work, we only focus on the reproducibility of figure 6 in the original paper with the Bidirectional Long-Short Term Memory(BLSTM) Recurrent neural network. Hence, the copula-based sampling method and other related statistical techniques which were involved in the original work are beyond the scope of the present project. Moreover, it should also be noted that in the original work the electricity prices were forecasted after the aggregated renewable generation and total load. These two variables were used as additional meaningful explanatory variables. For the sake of simplicity, however, these two variables are not included in this project and instead the prices are forecasted directly.

In the remaining, first we explain the methods to perform the the day-ahead price forcasting using BLSTM. Afterwards, the results and discussion are presented. Finally, the conclusion is given.

Methods and setup:

Input data

Obtaining the data and transforming it into useful training examples was a challenging aspect of reproducing this paper. Historical weather data is not freely available, and the electricity dataset is also are not freely sharable. Thus instead of providing the dataset we worked with, we provide a gist with instruction to obtain it. It was noted in the original work that the quality of the models was increased with the introduction of weather data (such as temperature, cloud cover, etc.) coming from meteorological tools. In the present work we also introduced weather data i.e. wind speed and cloud cover as two input features.

BLSTM

BLSTM combines bidirectional RNNs with LSTM which has the benefits of both long-range memory and bidirectional processing. In other words, the core concept behind the BLSTM is that, instead of running an LSTM only in the forward mode starting from the first instance, we start another one from the last one running from back to front. BLSTM adds a hidden layer that passes information in the backward direction to more flexibly process such information. This has the effect that at each timepoint the hidden state has both information from previous and future time points. Fig. 1 illustrates the architecture of a BLSTM with a single hidden layer.

Architecture of a bidirectional RNN.
Fig. 1 Architecture of a bidirectional RNN¹.

Quantile regression

For predicting electricity prices, the authors were interested in learning a distribution of possible values instead of learning the mean. Where commonly in a regression model the Mean Squared Error loss is used, this is replaced with the Quantile Loss function. For our BLSTM model this means that compared to an BLSTM that is used with an MSE loss, the output projector that normally projects each hidden state to a single output, should instead project each hidden state to a vector of the same size as the number of quantiles.

Model

Our model thus looks like following: first a sequence of length 24 with input features for each timestep is fed into the BLSTM. This results in 24 hidden states. Then using a Linear (fully connected) layer each hidden state is mapped to a vector of the same size as the desired number of quantiles. This is output of the model. The loss is then computed by passing the output into the quantile loss function along with the targets.

Early stopping and weight noise addition

To improve the network robustness for unseen data, the original work employed two regularization techniques. First, early stopping was implemented to stop the learning phase at the optimal time (before the network begins to be too closely adapted to the training dataset). The early stopping functionality tracks the validation loss at each epoch and compares it with the previous validation loss. If the validation loss is decreasing it saves the trained model. Once the validation loss in an epoch increases over the previous one, it starts to count the number of epochs since the validation loss stopped decreasing. If the validation loss keep increases in the next epochs and the count surpassed the tolerance number, then the early stopping will be executed.

The second technique is the addition of weight noise during training so as to ensure that the network ignores the irrelevant information (noise in the data). In this project we employed both techniques to improve our model performance. It should therefore be noted that other hyperparameters (such as the variance of weight noise, or the learning rate of the gradient descent learning procedure) have to be optimized together with the complexity of the network architecture during the optimal model selection.

Hyperparameter optimization

In this work, the open-source hyperparameter optimization framework Optuna is used to optimize the network performance. The reason that we chose Optuna is that its extended flexibility and compatibility with Pytorch. Namely, the learning rate, number of layers, hidden size, the weight for the weight noise, tolerance number for the early stopping are considered for the optimization.

Results and discussion:

Training and validation loss with and without optimization is shown in Fig. 2. For the case without optimization, the parameter are set as follows: hidden size =10, number of layers = 3, learning rate = 0.007, variance of the weight noise = 0.01 and tolerance of the early stopping = 6. The optimized parameters with hyperparameter tuning are as follows: hidden size = 80, number of layers = 4, learning rate=0.0013, variance of the weight noise = 0.5131 and the tolerance of the early stopping = 8.

Fig. 2 Training and validation losses with and without optimization.

It is interesting to see that for this work the hyperparameter optimization does not improve a lot the network performance. This is possibly caused by the fact that we weren’t able to run the optimization for a long time.

Fig. 3 displays the result of running the model on some data in the validation set. It shows that the early stopping routine indeed results in a good fit on the validation set. Below we compare a plot from the validation set to one of the test set which we did not optimise for.

Fig. 3 Validation result for day-ahead electricity prices.

In order to evaluate the model, we apply it to unseen test data as well. A result for one week is shown in Fig. 4. It can be seen that the actual value generally stays within the predicted quantiles. The quality of the fit also seems similar to that obtained on the validation set, which indicates that model also generalizes to new data.

Fig. 4 Test result for day-ahead electricity prices.

Conclusion

In this project, the day-ahead prices forecasted for a specific scenario outlined in the work of Taubeou et al., 2018, is reproduced by implementing the outlined methods in Pytorch. As suggested in the original work, early stopping and weight noise addition are implemented to improve the predictive capability of the model. Furthermore, hyperparameter optimization is performed using the open-source framework Optuna. The hyperparameter optimization slightly improves the network performance.

Overall, the model predicts well the trend of the actual data. Furthermore, our results look comparable to those achieved in the paper and that we thus managed to reproduce it.

Link to the notebook.

Link to gist for obtaining data.

[1]Deep Learning-Based Multivariate Probabilistic Forecasting for Short-Term Scheduling in Power Markets, Toubeau, Jean-François and Bottieau, Jeremie and Vallee, François and De Greve, Zacharie, IEEE Transactions on Power Systems, 2019, doi:10.1109/TPWRS.2018.2870041

[2] Dive into deep learning: Tools for engagement,Quinn, Joanne and McEachen, Joanne and Fullan, Michael and Gardner, Mag and Drummy, Max,
2019,Corwin Press.

--

--