A sampling criterion for constrained Bayesian optimization with uncertainties

We consider the problem of chance constrained optimization where it is sought to optimize a function and satisfy constraints, both of which are affected by uncertainties. The real world declinations of this problem are particularly challenging because of their inherent computational cost. To tackle such problems, we propose a new Bayesian optimization method. It applies to the situation where the uncertainty comes from some of the inputs, so that it becomes possible to define an acquisition criterion in the joint controlled-uncontrolled input space. The main contribution of this work is an acquisition criterion that accounts for both the average improvement in objective function and the constraint reliability. The criterion is derived following the Stepwise Uncertainty Reduction logic and its maximization provides both optimal controlled and uncontrolled parameters. Analytical expressions are given to efficiently calculate the criterion. Numerical studies on test functions are presented. It is found through experimental comparisons with alternative sampling criteria that the adequation between the sampling criterion and the problem contributes to the efficiency of the overall optimization. As a side result, an expression for the variance of the improvement is given.


Introduction
Despite the long-term research efforts put into numerical optimization, many practical applications remain difficult.There are three main reasons: most real problems involve nonlinear constraints, the objective function or the constraints are numerically costly to evaluate (e.g., when nonlinear finite elements underlie the optimization criteria), and some of the parameters are uncertain.
To ease the computing load, Bayesian Optimization (BO) incorporates kriging surrogates to save calls to the objective function, as embodied in the archetypal EGO algorithm [46].The original optimization problem is translated into a series of other problems, that of the acquisition of new points where the costly function will be calculated.The acquisition criterion is based on the kriging model and it mitigates the optimization of the function and the improvement of the kriging model [20].BO has rapidly been extended to encompass constraints [44,37] [17,21].
In this article, the focus is not only on costly and general nonlinear constrained optimization but also on problems that are affected by uncertainties.
Uncertainties may originate from random environments such as weather, noise in sensors or uncontrolled boundary conditions.Many physical models also come with uncertainties in an attempt to describe a lack of knowledge about the true phenomena.Big data applications are confronted to uncertainties which reflect the part of the data that cannot be handled at a time, either because the data arrive as a dynamic flow, or because the volume is too large to be processed in a single step.Since uncertainties are so ubiquitous, robustness against uncertainties is becoming an important aspect of optimization problem solutions.
When the uncertainties cannot be characterized in stochastic terms such as probabilities, strong guarantees about the robustness of the solutions can be obtained with deterministic approaches based on worst-case scenarios over the set of possible uncertainties [8,19] [29].If this set is large (possibly infinite) and the problem non-convex, the conservatism of the solutions and the computational tractability are inherent difficulties of this family of methods.When the uncertainties are seen as stochastic, two situations may be distinguished.
In a first class of problems, the uncertainties are instantiated within the objective or constraint functions and cannot be chosen.Such uncertainties are a hard coded, not controllable, noise corrupting the objective and the constraint functions.A typical situation is when the functions resorts to random number generations, e.g., for a Monte Carlo simulation, and no access to the source code is granted.Stochastic algorithms can, under conditions about their own stochasticity, accomodate the noise in the observations and still converge to problem solutions: this has given rise in the early 50's to stochastic descent algorithms [24,3] that have since then experienced great developments [48,2] often in relation to machine learning [25]; well-performing versions of (stochastic) evolutionary algorithms for noisy functions have been identified [11,28] thanks to competitions on reference benchmarks [4].
In a second class of problems, the uncertainties perturb parameters that are distinct from the optimization variables and can be chosen during the simulations.The separation between optimization (or design) variables and uncertain parameters was already underlying Taguchi's design of experiments in the 80's [27].Because of this separation and providing that a probability of occurence of the uncertainties exists, a statistical modelling in the joint design × uncertain parameters space is possible.This will be the context of the current work.
A key step when optimizing in the presence of uncertainties is the formulation of the problem, i.e., the choice of the robustness criteria.Considering first unconstrained problems, relevant criteria are the expectation of the objective [23] or one of its (super-)quantiles [49,50].In Robust Optimization, the uncertainties are handled in terms of specific compromises between the average performance and its dispersion [34,9] or by investigating all such Pareto optimal compromises through a multi-objective formulation [42].
When there are constraints that depend on the uncertainties, the feasibility of the solutions is typically measured in probability.Probabilistic models of the constraints are called chance constraints [32] or reliability constraints [10].The field of Reliability-Based Design Optimization (RBDO) is concerned with the resolution of optimization problems that contain reliability constraints [5].The optimization problems are formulated in terms of statistical criteria such as probabilities of satisfying the constraints, expectations or (super-)quantiles or conditional quantiles of the objective function [49,26,40].When the statistical criteria cannot be calculated analytically, the bottleneck of the computational cost of the optimization is even more stringent since the statistical criteria must be numerically estimated within the optimization iterations.This is sometimes called the double loop issue i.e., for each design point visited, many simulations are needed to encompass the effects of the uncertainties.Several paths have been proposed to circumvent the double loop issue, some approximating the probability of feasibility by reliability indices [51], others improving the efficiency of the probability calculations (e.g., stratified sampling in [53]), others decoupling the reliability estimation from the optimization.[47] gives a review of some of these techniques.
In the last decade, numerous contributions to the optimization of costly functions with uncertainties have relied on the learning of a metamodel of the true functions, in particular Gaussian processes (GP) [14,31].In [23] and [1], the GP not only helps for the optimization (or for the inversion) of the design variables, but it also serves to define an optimal sampling scheme.
In this article, the problem of minimizing the mean of a stochastic function under chance constraints is addressed.The objective function and the constraints are costly in the sense that they cannot be calculated more than a hundred times.Furthermore the problem is assumed to be nonconvex so that part of the solution process will not be analytical.Uncertainties are described by parameters different from the optimization variables and that can be chosen in the calculations.Generalizing [23], an optimization and sampling Bayesian procedure is proposed that accounts for probabilistic constraints.The acquisition criterion is based on the feasible improvement, which is known to be more my-opic (less explorative) but also numerically more tractable than its information theoretic counterparts [21,36].Indeed, information theoretic criteria, which quantify reductions in the entropy of the predicted optima locations or values, imply internal optimizations of the GP trajectories.Note also that [21,36] do not consider uncertain parameters.A less myopic feasible improvement criterion, which looks two-steps ahead, is given in [52], yet, again, this work does not address uncertain parameters.On the contrary, uncertainties are examined in [41]: thanks to a spectral decomposition of the GP covariance functions, the variance and the mean of the objective function under the uncertain parameters become a GP, which then opens the way to acquisition criteria based on improvement.The constraint bears on the objective variance.Our contribution differs from [41] essentially by the proposition of a policy to choose where to evaluate the uncertain parameters, and secondarily, by the consideration of general constraints.
After formulating the problem (Sections 2.1 and 2.2), a principle for devising robust bayesian problems is stated (Section 2.3) which applies to any given progress measure.In Section 3, this principle is applied to the feasible improvement as a specific progress measure.The associated sampling criterion is introduced in Section 4. It is a Stepwise Uncertainty Reduction (SUR) criterion [7], a one-step-ahead variance reduction, for which a proxy which is easier to compute is presented.The resulting algorithm is summarized in Section 5, and its performance assessed on an analytical and an industrial test case.

Problem formulation 2.1 Starting problem
Let f (x, u) be the scalar output of an expensive computer simulation and let g i (x, u), i = 1, . . ., l be the set of constraints, where x ∈ S X can be precisely chosen while u ∈ S U is a realization of a vector of random variables U with a specified probability density function ρ U .Such a formulation with design and random variables is general to all optimization problems under uncertainty.The examples in this article belong to continuous optimization in the sense that S X ⊂ R d and S U ⊂ R m .Nevertheless, it is important to note that the framework of our work, the Gaussian processes and the algorithms which will be introduced, generalize nicely to spaces that contain discrete variables (see for example [35,43]).
Our goal is to find x which minimizes f while insuring that the g i 's lies under a failure threshold (0 in general).In the presence of uncertainties, Robust Optimization aims at controlling the impact of uncertainties on the performance of the optimal solution.The problem is that f (x, U) and g i (x, U), i = 1, . . ., l are random quantities induced by U. In order to perform optimization, we need to fall back on a deterministic form which is achieved by applying some statistical measures to f (x, U) and g i (x, U), i = 1, . . ., l [49,26].
In this article, the constrained optimization problem under uncertainties is formulated as the minimization of the expectation over U of f while all the constraints g i (x, U) ≤ 0, i = 1, . . ., l are satisfied with a high probability: α is a reliability parameter representing the allowed constraint violation level (0 < α < 1).
In the formulation of Equation ( 1), the emphasis is on constraint satisfaction with a guaranteed reliability.This is common in RBDO where constraints are satisfied in probability and the objective function is deterministic [14].Such problems are also said to have chance constraints [8].In addition in Equation (1), random events affect the objective function and are taken into account through an expectation.Thanks to its linearity, the expectation is the simplest statistical measure.Besides, efficient approaches exist to estimate and optimize it [23].Formulations such as Equation ( 1) with a mean objective function and chance constraints have been called a "model for qualitative failure" [2] in which the objective function and the constraints are taken as independent.Other formulations with quantiles conditioned by feasibility have been given in [26,40] but they remain an open challenge for costly problems.In the current work, Equation ( 1) is addressed because it is a compromise between complete robust formulations and mathematical tractability.
By seeing the probability as an expectation, the constraint part of Equation (1) becomes From the last expression, the problem (1) is equivalent to where z(.) := E[f (., U)] and c(x

Gaussian Process regression framework
In the context of expensive computer simulation, the Problem (2) is approximated with Gaussian processes (GPs).Directly building a metamodel for z and c would need too many evaluations of f and g i to estimate the expectation and the probabilities.Therefore, GP approximations of f and the g i 's are built in the joint space S X × S U .Models for z and c in the design space S X are then deduced from them.More precisely, we suppose that f and the constraints (g i ) l i=1 are realizations of independent Gaussian processes F and G i such that where m F and m Gi are the mean functions while k F and k Gi are the covariance functions.
Let F (t) and G (t) i denote the Gaussian processes conditioned on the t observations, f (t) = (f (x 1 , u 1 ), . . ., f (x t , u t )) and g (t) i = (g i (x 1 , u 1 ), . . ., g i (x t , u t )), obtained at points D (t) = {(x k , u k ) , k = 1, .., t}.Since the expectation is a linear operator applied to f , it follows that given by: The integrals that appears in (3) can be evaluated analytically for specific choices of ρ U and k F (see [23]).In the general case, a quadrature rule can be used to approximate these integrals or a Monte Carlo method if m is large.
We also introduce the process which is the statistical model of the constraint c (Equation ( 2)).Note that the process C (t) is not Gaussian.
To solve Problem (2), Bayesian optimization is used where functions z and c are considered realizations of processes Z and C. The procedure is detailed next.

A general principle for devising robust Bayesian optimization algorithms
Now that the problem has been formulated (Section 2.1) and the GP models introduced (Section 2.2), we present a Bayesian algorithm to solve Problem (1) within a restricted number of calls to f and the g i 's.Before going into the details of the method, it is important to understand the general principle that underlies the design of robust Bayesian Optimisation (BO) algorithms.

Proposition 1 (A general principle to devise robust BO algorithms).
Robust Bayesian Optimization algorithms can be designed as follows: A) define a progress measure P (x) in relation with the problem formulation and calculated from the GPs trajectories.B) The robust BO algorithm is: Define an initial space filling design in the joint space S X × S U : D (n0) Initialize all conditional GPs: , i = 1, . . ., l with D (n0) while stopping criterion not met do Determine a desirable, targeted, x by maximizing the expectation of the progress measure, The next iterate minimizes the one-step-ahead variance of the progress measure at x targ , where P (t+1) is evaluated with GPs updated according to Calculate the simulator response, i.e., f and g i , i = 1, . . ., l, at the next point (x t+1 , u t+1 ).Update the design , i = 1, . . ., l. end while Various algorithms can be obtained by changing the measure of progress P .In this article it will be the feasible improvement, which will soon be presented.Other measures of progress are possible, for example P where the p i 's are positive penalty parameters and u mod the mode of U, or P (x) = max [z feas min − Z(x) | C(x) ≤ 0], 0 where z feas min is the best objective function associated to a feasible point.The goal of the next sections is to present the methodology and the associated formulas when the progress measure is the feasible improvement, chosen for its closeness to the problem formulation and its computability.The one-step-ahead variance can be difficult to tackle so approximations are useful.In this text, the generic term "sampling criterion" relates to the one-step-ahead variance or its proxy.For costly problems, the stopping criterion in Proposition 1 is often a given number of calls to the simulator.
3 The progress measure and the associated targeted point x targ Following Proposition 1, the first step consists in defining a progress measure P (t) , which will be the cornerstone of the definition of what a most promising candidate for evaluation, x targ , is.The maximization of its expectation should contribute to both solving the constrained optimization problem and improving the GPs.The most popular progress measure for optimization under constraints is the Feasible Improvement [46,45] defined by where + denotes the improvement over the current feasible minimum value.
In our case, z feas min must be further explained because it is not directly observed.This will be the subject of the next section.The definition of z feas min and the fact that C (t) (x) is not Gaussian is a difference between the F I of this article and those in [46,45].
Following Proposition 1 and Equation ( 4) the promising point in the control space is obtained by maximizing the expectation of the progress measure.Here it corresponds to maximizing the Expected Feasible Improvement (EFI), where . The independence of the GP processes implies that the EFI can be expressed as The first term is the well known Expected Improvement (EI) for which an analytical expression is available, where Z (x, x), Φ and ϕ are the normal cumulative distribution and density functions, respectively.The second term, P(C (t) (x) ≤ 0), can be approximated with available numerical methods (see details in Section 5.2).

Definition of the current feasible minimum z feas min
To solve problem (6), we need to define z feas min .We extend the definition of the current minimum for a non-observed process Z introduced in [23] to a problem with constraints.z feas min is defined as the minimum of the mean of the process Z (t) such that the constraint is satisfied in expectation Under Fubini's condition and as the constraints are conditionally independent given x and u, the expectation of C (t) is an integral over the uncertain space of a product of univariate Gaussian cumulative distribution functions If there is no solution to problem (8), we choose the most feasible point in expectation,

Extending the acquisition criterion
The sequential methodology introduced in Proposition 1, Equation ( 5), requires to choose a couple (x t+1 , u t+1 ) such that the variance of the one-step-ahead feasible improvement is minimal, i.e., (x t+1 , u t+1 ) = arg min where I (t+1) and C (t+1) are the updated I (t) and C (t) taking into account the observations at the point (x, ũ).This choice increases the most the information provided at the current point of interest, x targ , where the information now includes both the probabilistic constraints and the improvement.
In [23], the authors noted that x t+1 is usually very close to x targ so instead of looking for the couple (x t+1 , u t+1 ), we assume that x t+1 = x targ .This simplifies the optimization of the sampling criterion because it reduces the dimension of the minimization (Equation (10)).As the one-step-ahead variance of the feasible improvement is difficult to evaluate, a computationally more tractable quantity, built with a product of variances, is now proposed.

The sampling criterion
We introduce a new sampling criterion specific to the robust optimization Problem (1), denoted S for Sampling as it is used to generate a new value for u.It is inspired by the variance of the one-step-ahead feasible improvement but easier to assess.Proposition 2. The sampling criterion, given a new observation at point (x, ũ), is In Equation (11) the first part of the expression forces the sampling to reduce the uncertainty in the improvement value while the second part focuses on the averaged predicted feasibility variance.For the sake of calculation simplicity, it might seem preferable to replace the improvement by the objective process Z (t+1) and the variance of feasibility by the variance of the constraints.However, such variances are not those of the quantities of interest.In optimization, it is not important to reduce the variance of the constraints if it is clear that points are feasible or infeasible.However it is important to reduce the variance when there is a large uncertainty on feasibility, which is achieved by considering the variance of the Bernoulli variable 1 ∩ l i=1 {G (t+1) i (xtarg,u)≤0} .In the same way, the variance of Z (t+1) in regions where it is clearly above the target does not matter, but the variance of the process where improvement is uncertain should be reduced.
The ideal sampling criterion for the optimization problem at hand, the variance of the feasible improvement (Equation ( 10)), is the variance of the product of the improvement times a feasibility with confidence.The criterion S bears some resemblance but is not equivalent to the variance of the feasible improvement.It is a product of variances, which is not equivalent to the variance of a product 1 .Furthermore, the second term in the product for S (Equation So, we choose the best candidate uncertainty u t+1 as The calculation of both terms of S is now presented. Calculation of VAR I t+1 (x targ ) The expression of VAR I t+1 (x targ ) given a new observation at point (x, ũ) is The expression of the expected improvement in terms of PDF and CDF of Gaussian distribution is well-known.The closed-form expression of the Variance of the Improvement stated in Proposition 3 can be obtained using the expression of the generalized expected-improvement criterion introduced in [46].For the convenience of the readers, we provide a complete proof in Appendix A.1.
Proposition 3 (The Variance of the Improvement).
, where σ (s) The variance of the product is the product of the variances only if EU = EV = 0 when As f (x, ũ) is unknown, we cannot apply Proposition 3 to Equation (13).We use that F (x, ũ; x, ũ) and the law of total variance To compute Equation ( 14), notice that the inside of the brackets have closedform expressions in terms of m Finally, the external E and VAR in Equation ( 14) are numerically evaluated.Details are given in Section 5.2.

Calculation of
Regarding the second term, we follow the Kriging Believer principle, leading to suppose that ∀i = 1, . . ., l ; m Gi (x targ , u).Under the hypothesis of independence of the constraint GPs, we have [13]).Further details about the numerical estimation of the above integral are given in Section 5.2.
The steps of the proposed methodology are summarized in Algorithm 1, called EFISUR for Expected Feasible Improvement with Stepwise Uncertainty Reduction sampling.Create an initial Design of Experiments (DoE) of size t in the joint space and calculate simulator responses: D (t) = {(x i , u i ) , i = 1, . . ., t}, and associated f (t) and g (t) i while t ≤ maximum budget do Create the GPs of the objective and the constraints in the joint space: and (G Calculate the GP of the mean objective, Z (t) , in the search space S X Optimize EFI to define x targ = arg max x∈S X EFI (t) (x) (Eq.( 6)) Set x t+1 = x targ Sample the next uncertain point by solving u t+1 = arg min ũ∈S U S(x targ , ũ) (Eq.( 12)) Calculate simulator responses at the next point (x t+1 , u t+1 ) Update the DoE: i ∪ g i (x t+1 , u t+1 ) , i = 1, . . ., l , t ← t + 1 end while

Numerical experiments
In this section the performance of the EFISUR method is studied first on an analytical test case, and then on an industrial application.The results are compared to two alternative procedures which are described below.
The code and data generated or used during the current study are available in the first author's GitHub repository [16].

Competing algorithms
Two algorithms serve as bases for comparison for EFISUR.To the best of our knowledge, although inspired by previous works, these algorithms are new.First, the EFIrand algorithm is identical to EFISUR to the exception of the point u t+1 which is simply sampled from its distribution.This EFIrand algorithm will be useful to judge the usefulness of the sampling criterion S in EFISUR.
Algorithm 2 : EFIrand Expected Feasible Improvement with random sampling Create an initial DoE of size t in the joint space and calculate simulator responses: D (t) = {(x i , u i ) , i = 1, . . ., t}, and associated f (t) and g (t) i while t ≤ maximum budget do Create the GPs of the objective and the constraints in the joint space: and (G Calculate the GP of the mean objective, Z (t) , in the search space S X Optimize EFI to define x t+1 = arg max x∈S X EFI (t) (x) (Eq.( 6)) Sample the next uncertain point randomly, u t+1 ∼ ρ U Calculate simulator responses at the next point (x t+1 , u t+1 ) Update the DoE: The other algorithm to which EFISUR will be compared uses the quantile as an alternative way of measuring the failure probability.For the first constraint, the quantile writes, When dealing with several constraints, P(g(x, U) ≤ 0) is then replaced by individuals constraints with quantile level 1−α/l.The quantile of the constraint i = 1, . . ., l is approximated using the predictive mean of the GP model G (t) , Further implementation details about the quantile estimation are given in Section 5.2.In order to choose the next point in the design space, x t+1 , this last competing methodology maximizes the expected improvement under constraints about the empirical quantiles: The sampling in the uncertain space is based on the deviation number developed in [15,18] for the Active Kriging Monte Carlo simulation technique.In these works, the uncertainty that is most likely to improve the GP is the one that minimizes the following DN (Deviation Number) function, The points that have a low deviation number are either close to the constraint threshold (null in Equation ( 18)), or they have a high GP variance.To handle multiple constraints, the constraint with the minimum value of DN is selected (as in [30]), DN c (u) = min i=1,...,l The whole algorithm is called cEIdevNum for Constrained EI plus Deviation Number.The cEIdevNum algorithm is an alternative to EFISUR for handling chance constraints in the augmented space.During the optimization step for x t+1 , the constraints are handled explicitely through an ancilliary constrained optimization algorithm, as opposed to the EFI that aggregates them in a single objective.However, the reliability is defined independently for each constraint, which is not equivalent to the real problem given in Equation ( 1).The sampling step (the DN minimization) accounts only for the constraints satisfaction and not for the objective function.The cEIdevNum algorithm bears some resemblance to the approach described in [31]: in both cases, the reliability constraint is estimated with a kriging model used to estimate quantiles, and sampling occurs through the minimization of the deviation.The generalization of EGO to constrained problems by approximating the constraints through kriging models and keeping them separated from the objective, as it is done in Equation ( 17), can be found in other papers, e.g., [6].However, to the authors' knowledge, cEIdevNum integrates these techniques within an EGO-like algorithm in an original manner.

Algorithm 3 : cEIdevNum
Create an initial DoE of size t in the joint space and calculate simulator responses: . ., t}, and associated f (t) and g (t) i while t ≤ maximum budget do Create the GPs of the objective and the constraints in the joint space: and (G Calculate the GP of the mean objective, Z (t) , in the search space S X Optimize the expected improvement under quantile constraints to determine the next iterate such that ∀i ∈ {1, . . ., l}, q 1−α/l (m Gi (x, U)) ≤ 0, (Eq.( 17)) Sample the next uncertainty by minimizing the deviation number, Calculate simulator responses at the next point (x t+1 , u t+1 ) Update the DoE: i ∪ g i (x t+1 , u t+1 ) , i = 1, . . ., l , t ← t + 1 end while

Implementation details
Two strategies are adopted depending on the numerical cost of the integrals.

Common Random Numbers for u samples
All three algorithms include Monte Carlo simulations with respect to u samples.The efficiency of all three algorithms is enhanced by a common random numbers (CRN) technique.This means that the same seed is used to generate all random variables throughout the optimization.In particular, the same realizations of the uncertain variables {u 1 , . . ., u M }, obtained by a quantization technique [33], are considered in all iterations.The CRN produces more stable optimizations and reduces the variance of the estimated probabilities since the induced error is consistent within different designs.There is however a bias in the Monte Carlo estimations, which we keep small here by choosing relatively large Monte Carlo number of simulations.
More precisely, the EFISUR algorithm uses the CRN at different steps: • In the EFI formula, the calculation of the quantity P(C (t) (x) ≤ 0) is approximated by where N realizations of the GPs are needed.
• In the z feas min formula, the calculation of E[C (t) (x)] is approximated by : The experiments reported in this article have as default N = 1000 trajectories of the GPs and M = 300 common random numbers.
The EFIrand algorithm is identical to EFISUR to the exception of the sampling of the next uncertain point which is random, hence it has a lower computational complexity.
Concerning the cEIdevNum algorithm, the quantiles making the constraints, Gi (x, U)) ≤ 0, are approximated by the corresponding order statistic associated to the sample {m (t) Gi (x, u j )} M j=1 .For sharper comparisons, we use the same seed (M = 300 common random numbers) to estimate the quantiles.

Computational complexity
Remember that t is the iteration number and the number of data points, t ≤ budget, M is the number of u samples and N is the number of GP trajectories in the simulations, l the number of constraint GPs, d and m the dimensions of u and x, respectively.The memory usage of the GPs grows in O(t 2 ) and is essentially the same for the EFISUR, EFIrand and cEIdevNum algorithms.The details of the calculations of the time complexities of the three algorithms are provided in Appendix B. The time complexities of the EFISUR and EFIrand algorithms are identical and of the order of max O(budget where the first term in O(budget 3 ) comes from learning the GPs in the augmented space while the second term in O(M 3 ) comes from calculating the EFI.Learning the GPs is the critical component of cEIdevNum which therefore is computationally cheaper than EFISUR and EFIrand in most situations where M > budget.For large budgets however, the cost of GPs construction will dominate that of the EFI, making all three algorithms equivalent in terms of complexity.

Quantization for m (t+1) Z samples
The calculation of the first term of the criterion S requires realizations of m (t+1) Z (see Equation ( 14)).To this end, we approximate the continuous distribution of m (t+1) Z (given in Equation ( 15)) by a discrete one.As m (t+1) Z follows a Gaussian distribution, we approximate it using the inverse transformation of the Sobol sequence.Thus, the external variance and expectation of Equation ( 14) are discretized at values representing the distribution of m (t+1) Z .In the upcoming experiments, a quantizer of size 20 is chosen.

Internal optimizations
The EFISUR algorithm and its two competitors involve several internal optimization problems that must be solved at every iteration.Some of these problems are unconstrained: the maximization of EFI with respect to x (EFISUR and EFIrand algorithms) or the minimization of S or U c with respect to u (EFISUR and cEIdevNum).Such internal continuous unconstrained optimization problems are handled with the derivative-free solver BOBYQA [39].
The Problem (17) in the algorithm cEIdevNum further requires a solver that can handle constraints.It is addressed with the COBYLA program [38].

Analytical test case
A first test-case is now considered to compare the three competing algorithms in moderate dimensions.A fully random search, Random, is added for reference.The problem has two design variables, two uncertain parameters and a single reliability constraint: By setting the target probability of failure to α = 0.05, the computed reference solution is x * = (−3.62069,−1.896552).This reference was found semianalytically because some of the calculations are manually tractable in the above problem.Figure 1 shows the contour plots of the functions E[f (., U)] and P(g(., U) ≤ 0) obtained from a 40 × 40 grid experiments, where at each grid point the expectation and the probability are approximated by a Monte Carlo method over 10 4 realizations of U.
To account for the inherent statistical variability of the algorithms, the runs are repeated 30 times for each method.The inital Design of Experiments of each method is a random Latin hypercube of 4 + (d + m) = 8 points.An additional budget of 56 iterations2 is used as a stopping criterion.The default Gaussian Process has a Matérn 5/2 covariance function and a constant trend.The performance of the various methods is measured by the average Euclidean distance between the optimum given by the method and the true minimum, at each iteration.The distance to the solution, averaged over the 30 runs, is plotted in Figure 2.
EFISUR and EFIrand converge faster to the optimum than cEIdevNum.After 40 iterations, EFISUR approaches the solution more closely than EFIrand does.A complementary view of the convergence, with dispersions, can be found in Figure 3 which shows the boxplots of the distances to the reference solution at iterations 15 (left panel), 40 (middle) and 60 (right).It is observed that EFISUR leads to an accurate solution from 40 iterations onwards with a small deviation between the runs.At all iterations, cEIdevNum has a larger mean distance to the solution (but the median is better at 60 iterations) and, mainly, a larger spread in results.Figure 4 shows the "a posteriori" probability, calculated by Monte Carlo, of satisfying the constraint at the current point seen as feasible Figure 2: Mean convergence rates (Euclidean distance to the solution) on the analytical test case.The EFISUR method is plotted with green triangles, EFIrand with red rounds, cEIdevNum with blue squares and Random with the black crosses.The initial DoE which precedes these iterations is not represented.by the EFISUR and EFIrand strategies at different iterations.Unlike EFIrand algorithm, starting from iteration 25, all the points proposed by the EFISUR algorithm satisfy the probabilistic constraint.Thus, the points proposed by EFISUR are not only close to the reference solution, but they are also inside the safe region.
Figure 5 shows the enrichment in the uncertain space S U for all methods and all runs.It is clearly visible that EFIrand, in the middle plot, samples the u's randomly.The Random search does the same, of course.EFISUR (left) and cEIdevNum (right) both sample large |u 2 |'s because they contribute to constraint violation irrespectively of x through the "+u 2  2 " term (cf.g(x, u) expression above).In addition, cEIdevNum samples large values |u 1 |'s for varied u 2 's because they are on the edge of the domain where the kriging variance is large (hence DN small).EFISUR breaks this symmetry and more sparingly tries small (negative) u 1 's associated to varied u 2 's because its criterion also accounts for  low objective function values: in the optimal region, x * ≈ (−3.6, −1.9), a small negative u 1 provides improvement through the term "−x 1 u 1 ".
Figure 5: Enrichment in the uncertain space, S U , for the three methods.
Two partial conclusions can be presumed from this set of experiments.First, the algorithms that rely on the EFI aggregated criterion to choose the next set of design variables, i.e., EFIrand and EFISUR, converge better than cEIdevNum and its constrained problem handled through COBYLA.Second, EFISUR provides an additional gain in efficiency thanks to its sampling criterion S which properly mixes the uncertainty about constraint satisfaction and improvement.

Industrial test case
We now report the application of the EFISUR method to an aeronautical test case.The NASA rotor 37 is a representative transonic axial-flow compressor that has been used extensively in the computational fluid dynamics (CFD) community to test optimization algorithms and validate CFD codes (see [22]).The optimization of the NASA rotor 37 compressor blade is a challenging test case primarily because of its high dimensionality: it has 20 design variables and 7 uncertain parameters.As such, to the best of the author's knowledge, such optimization has never been attempted using global metamodels.Furthermore, the design of the NASA rotor 37 compressor blade is highly nonlinear.And, as is common in CFD, each evaluation of the optimization criteria involves costly finite elements analyses.Formally, the optimization problem reads as follows: Because of the dimensionality and the numerical cost, the use of surrogate models is the only path to perform such an optimization.To highlight the importance of carefully sampling in the joint-space during the optimization, we compare our approach to the EFIrand algorithm.For comparison purposes, we also carry out a random search.All the methods are started with a DoE of 100 points drawn from an optimal Latin Hypercube Sampling.The enrichment then proceeds by adding 137 points, for a total of 237 model evaluations.Figure 6 shows the convergences of the feasible minimum in terms of mean objective function.We first note that the EFIrand and EFISUR methods very clearly give access to better designs than the random search does.We observe that, after a warm-up phase of about 140 analyses, EFISUR finds better designs than EFIrand does.This behavior is attributed to the only difference between EFISUR and EFIrand, that is the sampling in the uncertain space.Properly choosing the random variables u results in a more accurate estimation of the criterion of merit, the feasible improvement, in the target region, which translates into more frequent progress.
Figure 7 shows two distributions of the objective: first, in grey, all the estimated mean objective functions m (T ) Z (x) of the points evaluated during, both, the initial DoE and the optimization; second, in blue, the estimated mean objective of the best feasible point, z feas min , during the optimization iterations.Note that the best feasible objective has shifted from −0.876 at the end of the DoE to −0.90174 after the optimization, representing 51.2% of the objective's range.All mean objectives shown on the Figure are evaluated with the last, most up-to-date, GPs.
Figure 8 shows, at the estimated feasible optimum, the relative value of each design variable with respect to its lower and upper bounds in polar coordinates.The largest radii are attributed to the variables which are close to their maximum allowable values, and vice versa.
Because it is generated by EFISUR, the above result assumes that the final GPs are accurate enough to correctly predict the probability of constraints satisfaction.In order to validate the surrogate model accuracy in the vicinity of the limit-state surface, the calculation of the probability of being feasible (Equation ( 20)) is repeated 500 times with M = 1000 u samples and N = 1000 trajectories of the GPs. Figure 9 provides the statistics of the probability of constraints satisfaction through a boxplot.Accounting for the bootstrap standard deviation, the targeted probability (0.95) remains below the confidence interval of the estimated probabilities.Thus the final GP models of the constraints are

Concluding remarks
We have proposed a robust Bayesian optimization algorithm to solve computationally intensive chance constrained problems.The algorithm, called EFISUR, carefully models all available information with Gaussian processes built in the augmented space of the design variables and uncertain parameters.New calls to the objective and constraint functions are based on extrema of an acquisition criterion followed by a sampling criterion.The acquisition criterion is an expected feasible improvement that accounts for both the average improvement in objective function and the constraint reliability.The associated sampling criterion is inspired by the one-step-ahead variance reduction in feasible improvement and computationally tractable.The article has detailed the analytical expressions of the acquisition and sampling criteria.
EFISUR has been compared to two alternative algorithms which differ in the acquisition and sampling criteria.The results show a gain in favor of the expected feasible improvement and the proposed one-step-ahead variance reduction criterion.This set of criteria accounts for both the objective function and the constraints, it is opportunistic in the sense that it strives for feasible improvement at the next iteration, and both criteria (EFI and S) are coherent because they both relate to the feasible improvement.The sampling criterion follows a principle of maximal uncertainty reduction.The applicability of EFISUR to an industrial test case has also been checked.
From a methodological perspective, further work on the topic might seek to account for the correlation that typically exists between the constraint functions or between the objective and the constraint functions.This would potentially improve the overall Gaussian model of the optimization functions.It should also make it possible to assign priorities for evaluating the constraints.We are very grateful to the three anonymous reviewers, whose comments improved the paper.

A.2 Variance of the Improvement at step t + 1
Recall the expression of the variance of the improvement given in Equation ( 14 is random through F (x, ũ).
It is now proved that the one-step-ahead mean of F , m (t+1) Z , is also Gaussian.The proof makes use of the one step update formula which is found in [12]:

B Time complexity calculation details
Again, recall that t is the iteration number and the number of data points, t ≤ budget, M is the number of u samples and N is the number of GP trajectories in the simulations, l the number of constraint GPs, d and m the dimensions of u and x, respectively.The steps for the derivation of the time complexity of the three algorithms are detailed below.It is assumed that the cost of the various internal optimizations is linear in the search space dimension.
• Prediction of a GP (mean, variance, EI) at a point, assuming the covariance matrix was decomposed and inverted before: O(t 2 ) • Sampling of a GP trajectory at M points: O(M 3 ).
• P(C (t) (x) ≤ 0): it costs N trajectories at M points that is N × O(M 3 ).
• E[C (t) (x)] for the calculation of z feas min : M ×l predictions, i.e., M ×l×O(t 2 ).• One calculation of EFI: the sum of the 2 aboves items plus the EI calculation, N × O(M 3 ) + M × l × O(t 2 ) • The maximization of EFI: the input dimension times the above complexity but z feas min is calculated only once, d × N × O(M 3 ) + M × l × O(t 2 ), the dominant term is d × N × O(M 3 ).
• Variance of the improvement: N predictions, N × O(t 2 ).
• Minimization of the sampling criterion: the dimension of the uncertainties times the calculation of the sampling criterion, m×max(N, M ×l)×O(t 2 ).
The overall time complexity of EFISUR is max O(budget 3 ) × (l + 1) × (d + m) , d × N × O(M 3 ) .The complexity of EFIrand is the same as that of EFISUR because the GP learning and the EFI maximization which are common to both algorithms are driving it.The cost of the sampling criterion minimization is of second order.Now, the cost of the cEIdevNum algorithm is split as follows.
• Calculation of the quantiles : M predictions of l constraints, M ×l×O(t 2 ).
• Calculation of the EI : one calculation of z feas min per iteration and a prediction of GP mean and variance.
The complexity of cEIdevNum is dominated by the cost of learning the GPs, O(budget 3 ) × (l + 1) × (d + m).If the number of uncertain parameters samples is larger than the budget of the GP regression, M > budget, then cEIdevNum will be faster to execute than EFISUR and EFIrand.
given by Proposition 3 and Equation(7) for the first and second brackets, respectively.The external E and VAR only concern m (t+1) Z (x targ ) whose randomness comes from F (x, ũ), σ (t+1) Z (x targ ) being not dependent on the evaluation of the function.It is proved in Appendix A.2 that m (t+1) Z (x targ ) follows

Figure 1 :
Figure 1: Contour plots of P(g(., U) ≤ 0) and E[f (., U)]: failure and feasible regions in red and green, respectively, the limit-state function in blue, objective function in dashed black lines.The solution is the yellow bullet.

Figure 3 :
Figure 3: Distance to the reference solution at iteration 15 (left panel), 40 (middle panel) and 60 (right panel) for the three strategies.The boxplots summarize 30 replications of the runs.

Figure 4 :
Figure 4: A posteriori probability of satisfying the constraint at the current "feasible" point z feas min for different iterations of the EFISUR and EFIrand strategies.

Figure 6 :
Figure 6: Convergence history of the average objective function of the current feasible minimum, z feas min .

Figure 7 :
Figure 7: Histograms of i) in grey, the 237 estimated mean objectives (m (T ) Z (x)) of the initial DoE followed by the optimization, ii) in blue, the 137 mean objectives of the best feasible point generated during the optimization.The red line corresponds to the mean objective of the best feasible point at the first iteration.

Figure 8 :
Figure 8: Relative coordinates of the optimal design with respect to their respective lower and upper bounds.

Figure 9 :
Figure 9: Distribution of constraints satisfaction probabilities at the optimal design computed with the final GPs.The boxplot summarizes 500 replications.The dashed line is the lower bound on constraint satisfaction (0.95).