Improving early phase oncology clinical trial design
Case study
Using Bayesian based BOIN and BOP2 designs
This case study presents an overview of two Bayesian model-assisted designs, particularly in how one or both can be applied to the design of an early phase oncology clinical trial. We will discuss each design and its associated operating characteristics, as well as amendments proposed during an ongoing study. We also present simulations toinvestigate the impact of these amendments on the operating characteristics of each study design. Finally, we discuss lessons learnt, including practical advice when designing smarter early phase oncology trials.
Challenge
The primary objective of a first in human (FIH) oncology phase 1 trial is to identify the maximum tolerated dose (MTD). Zhou et al1 has shown model-based and model assisted designs out-perform simple rule-based designs such as the standard or classical “3+3” design. Where limited information is available about the dose-response model, model-assisted designs are preferred.
The Bayesian optimal interval design (BOIN)2 is a model-assisted design that is now being used to identify the MTD. The BOIN design allows tailored escalation towards a target toxicity dose level, whilst maintaining control of the specified maximal tolerable rate of toxic events. The BOIN design assumes monotonic increase in toxicity with dose. Thereafter, it is inferred that maximum efficacy is associated with the MTD. The BOIN design is very simple to implement. The dose selection of the next cohort of patients involves a comparison of the DLTs at the current dose with pre-defined boundaries, similar to the “3+3” design.
Once the MTD is established, Simon’s Two-stage design3 is typically used to obtain preliminary efficacy data. This can be either as an expansion cohort in the FIH study, or as a separate phase 2 study. However, Simon designs and other hypothesis testing designs are currently discouraged by FDA and alternative designs and sample size justifications based on estimation precision are preferred, especially for single arm PoC trials The Bayesian optimal design for phase 2 clinical trials (BOP2)4 is an alternative Bayesian design capable of handling simple (e.g. binary) and complicated (e.g. ordinal, nested and co primary) endpoints under a unified Bayesian framework.
Solution
In this case, the project comprised an open-label dose escalation study (Part A) to assess the safety and tolerability of the drug, followed by dose expansion cohorts to obtain preliminary efficacy data (Part B). The MTD was to be determined in Part A, assessed based on the number of dose-limiting toxicities (DLTs) observed at a specific dose level. A BOIN design was used, and consisted of eight planned escalating dose levels; dose levels 1 to 8. The BOIN design was selected because:
- limited information was available regarding the expected dose-toxicity curve when designing the study,
- the design doesn’t require an in-trial statistician and statistical processing,
- the escalation table is intuitive and understandable to/reviewable by investigators, and
- is optimal for MTD finding per Bayesian decision-theoretic criteria.
After enrolling into dose level 1, subsequent dose levels were only opened if the previous dose level was deemed tolerable. The first dose level was to enrol a minimum of one patient. If a Grade 2 or higher adverse event (AE) was observed during the evaluation period or when dose level 5 was reached, a minimum of three patients were to be enrolled per dose level in accordance with the BOIN design dosing rules. Cohorts of three patients were recruited after the first dose level enrolling three patients.
To further characterise the safety and assess clinical activity of the drug, Part B employed a BOP2 design. Three expansion cohorts were planned, each comprising patients with different advanced solid tumours with locally advanced or metastatic, non-resectable disease, which had progressed despite treatment with standard first line treatment. The cohorts were considered independent of each other. The dose for each cohort was to be determined after completion of Part A. The BOP2 design was selected since fewer patients were needed to be able to assess whether there was sufficient activity to warrant further investigation in a phase 3 pivotal trial.
In both parts of the study, the drug was administered as a single intravenous infusion approximately every 14 days for a total of two infusions per treatment cycle. A treatment cycle for observing DLTs in Part A was therefore defined as 28 days. For Part B, efficacy was assessed over two complete treatment cycles. Part A of the study is complete. Part B is ongoing.
BOIN study design assumptions
In Part A, the target toxicity rate for the MTD was set as ≤30% (ie Φ= 0.3), and the maximum sample size as 27 with a maximum of 12 patients per cohort. After completion of the study the MTD based on isotonic regression was to be used in Part B. Figure 1 summarises the study design for Part A. The decision rules for Part A are summarised in Table 1.


BOIN operating characteristics
The operating characteristics of the study design were assessed via simulation, using the shiny app “BOIN” published by MD Anderson Software5. Figure 2 present the different scenarios that were simulated, which represented a range of possible outcomes.

The results of the simulations are presented in Figure 3. The operating characteristics show that the design selects the true MTD, if any, with high probability and allocates more patients to the dose levels with the DLT rate closest to the target of 0.3.

BOIN amendments
Whilst enrolling patients into dose levels 1 to 5 it became clear that the drug was benign from a safety perspective. Further, newly available pharmacodynamics data indicated that rather than a monotonic increase in DLTs with dose and the maximum efficacy corresponding to the highest dose, efficacy might actually follow a bell shaped dose-response curve. The biological explanation being that at high concentrations the target engagers for the drug may become highly saturated resulting in “insulating effects”, which could then restrict the efficacy of the drug. Subsequently the study design needed to be adapted to maximise both safety and pharmacodynamics information in Part A, with no increase in patient numbers, costs or study timelines.
To explore the pharmacodynamics dose relationship in more detail, modifications to the BOIN study designs were required. The modifications considered included: 1) stopping the trial if a maximum of 9 patients for a dose level was reached, 2) reducing cohort sizes to 2 and 3) studying 7 dose levels. Simulations were conducted to assess the impact of the three options on both the total number of patients needed for Part A, and the operating characteristics of the revised study designs.
The initial simulations showed that the expected patient numbers reduced when the maximum cohort size is set to 9, with minimal impact on the study operating characteristics. The total number of patients though, only reduced by 3-5 patients. Reducing cohort sizes to 2 patients further reduces the expected patient numbers. However, the frequency that the MTD target probability of 0.3 is estimated reliably can be significantly impacted.
Figure 4 present the results of the simulations for a seven dose level BOIN design, given the observed DLT profile to date. Scenario 1 was not included in the simulations since the accumulated data supported the MTD being higher than dose level 2. The number of patients to be treated at each dose level is clearly significantly reduced for a seven doselevel BOIN design, thereby achieving a greater saving interms of patient numbers and time than the other options. Using a design with seven dose levels, Part B could start earlier and additional patients could be recruited into PartB to provide a better estimate of the pharmacodynamics- dose response relationship and efficacy within each cohort. Subsequently a protocol amendment was implemented to remove dose level 8 from the study.

BOP2 study design assumptions
In Part B, efficacy was to be assessed using the RECIST defined ORR endpoint6. A null
hypothesis of H0:Peff≤0.05, representing an inefficacious treatment, and an alternative hypothesis H1:Peff<0.25, representing an efficacious treatment, were assumed. Assuming a βeta (0.05, 0.95) prior distribution for Peff, Table 2 summarises the stopping boundaries that yielded a statistical power of 0.8936 under H1. If the total number of patients reached the maximum sample size of twenty-five, the null hypothesis was to be rejected if the number of responses are greater than two; otherwise, the treatment was deemed not promising. The go/no-go criteria were non-binding. The futility analyses were based on the 16-week response data.
Table 2: Optimised stopping boundaries for BOP2 design
Number of patients treated | Stop if # ORR < |
---|---|
10 | 0 |
15 - optional | 1 |
25 | 2 |
The operating characteristics of the design using the shiny app “BOP2” published
by MD Anderson Software (5) are summarized in Table 3.
Table 3: BOP2 operating characteristics
Response rate | Early stopping % | Claim promising % | Sample size |
---|---|---|---|
0.01 | 99.09 | 0.07 | 10.6 |
0.05 | 83.72 | 8.71 | 13.6 |
0.10 | 58.07 | 33.40 | 17.4 |
0.15 | 35.48 | 59.40 | 20.5 |
0.20 | 19.97 | 77.77 | 22.5 |
0.25 | 9.82 | 89.36 | 23.7 |
0.30 | 4.79 | 94.91 | 24.7 |
0.35 | 2.33 | 97.62 | 24.7 |
0.40 | 1.14 | 98.86 | 24.9 |
0.45 | 0.42 | 99.58 | 24.9 |
BOP2 amendments
Given the benign safety profile of the drug established in Part A, Part B was subsequently expanded to include up to eight cohorts. Seven cohorts comprised patients with different but specific tumour types, and followed the BOP-2 design with ORR as the efficacy endpoint. The exception was a cohort where the goal was to compare the new drug against standard of care for a specific tumour type, in a 2:1 randomisation. For this comparative cohort, the analysis was amended to a Bayesian Logistic regression model with treatment and a pre specific covariate.
The operating characteristics of the design and analysis method for the comparative cohort was investigated using simulation assuming different prior distributions based on the KEYNOTE-048 study with pembrolizumab7 and a vague prior. The simulations showed a sample size of 180 patients in a 2:1 randomization provides approximately 90% power to detect a difference of 15% between the treatment groups with an estimated two-sided Type 1 error rate of 0.0391. The margin of error is approximately ± 4% for the Type 1 error calculations. Subsequently the protocol included an interim analysis after a minimum of 60 enrolled patients. Once the required patients are enrolled, recruitment is to be paused until the accumulated data analysed and regulatory authorities consulted. The cohort will be considered futile (non-binding)if the probability of observing a difference in estimated effects between control and treatment of at least 15% is less than 0.1.
Outcome
Publications have shown that using model-based and model assisted designs can improve the estimation of the MTD and can provide a more reliable assessment of whether a drug is efficacious in early phase oncology studies. As a result, these types of designs are starting to be utilised more widely. Model assisted designs such as the BOIN method have the advantage of being both flexible with respect to cohort sizes, and dose escalation or de-escalation can be made based on the observed DLT akin to “3+3” study designs. The BOP2 method explicitly controls the type 1 error rate, thereby bridging the gap between Bayesian designs and frequentist designs. Both are also flexible enough to respond to changes in study objectives that research team may want to implement during a study.
When designing an early phase oncology study it is imperative to discuss the study objectives and any associated underlying study design assumptions in a transparent and clear manner with all stakeholders, and where possible to support assumptions by pre-clinical data. In this case study, if efficacy was known to follow a bell shaped curve when designing the study then the underlying study design assumption of monotonic increasing efficacy and safety would not have been made. An alternative more efficient design to explore both safety and efficacy simultaneously is the BOIN-ET design8.
Although Bayesian based BOIN and BOP2 methods are a significant improvement over traditional frequentist methods such as a 3+3 design with an extension cohort, opportunities still exist to implement even smarter designs. For example, deploying basket designs within a master protocol9 when obtaining preliminary efficacy data. In Part B, although one could argue the protocol was equivalent to a master protocol, cohorts were assumed to be independent of each other. Opportunities exist to extract information across cohorts to improve the efficiency of the statistical analysis using a Bayesian framework. These designs are becoming more popular and encouraged by regulators.
In conclusion, ICON has the expertise and ability to design smarter innovative early phase oncology studies, which can reduce costs and increases success rates.
For more information
Contact usReferences
- Zhou H, Murray TA, Pan H, Yuan Y (2018) Comparative review of novel-assisted designs for phase 1 clinical trials. Stat in Medicine 37 (14) 2208-2222
- Liu S, Yuan Y (2015) Bayesian Optimal Interval Designs for Phase 1 Clinical Trials. J. R. Stat. Soc: Ser C: Appl. Stat. 64 (3) 507-523
- Simon R. Optimal Two-stage designs for phase II clinical trials. Available at https://brb.nci.nih.gov/techreport/Optimal2-StageDesigns.pdf (Last accessed on 10th January 2022)
- Zhou H, Lee J J, Yuan Y (2017) BOP-2: Bayesian optimal design for phase II clinical trials with simple and complex endpoints. Statistics in Medicine. 21: 3302-3314
- Integrated platform for designing clinical trials. M D Anderson Available at https://trialdesign.org/#newsSection Last access 11th January 2022
- Eisenhauer EA, Therasse P, Bogaerts J, Schwartz LH, Sargent D, Ford R, Dancey J, rbuck S, Gwyther S, Mooney M, Rubinstein L, Shankar L, Dodd L, Kaplan R, Lacombe D, Verweij J. 2009) New response evaluation criteria in solid tumors: Revised RECIST guideline (version 1.1). Available at https://ctep.cancer.gov/protocoldevelopment/docs/recist_guideline.pdf. Last accessed 10th January 2022.
- Burtness B, Harrington K J, Greil R, Soulieres D, Tahara M, de Castro G (2019) Pembolizumab alone or with chemotherapy versus cetuximab with chemotherapy for recurrent or metastatic squamous cell carcinoma of the head and neck (KEYNOTE – 048): a randomsied, open-label, phase 3 study. The Lancet 394, 1915-1928
- Takedo K, Taguri M, Morita (2018) S BOIN-ET: Bayesian optimal interval design for dose finding based on both efficacy and toxicity outcomes. Pharmaceutical Statistics. 17 : 383-395
- Drazen J M, Harrington D P McMurray JJ V Ware J H Woodcock J (2017) Master Protocols to Study Multiple Therapies, Multiple Diseases, or Both New England Journal of Medicine. 377; 62-70