結腸がんの腫瘍、免疫、マイクロバイオームの統合アトラス

Nature Medicine volume 29、pages 1273–1286 (2023)この記事を引用

27,000 アクセス

2 引用

113 オルトメトリック

メトリクスの詳細

広範な追跡情報を備えたマルチオミクスがんデータセットが不足しているため、臨床転帰の正確なバイオマーカーの特定が妨げられています。このコホート研究では、原発性結腸がんに罹患した患者 348 人から採取した新鮮凍結サンプルに対して、RNA、全エクソーム、深部 T 細胞受容体、腫瘍および一致する健康な結腸組織の 16S 細菌 rRNA 遺伝子配列決定を含む包括的なゲノム解析を実施しました。マイクロバイオームのさらなる特性評価のための腫瘍全ゲノム配列決定を使用します。免疫学的拒絶定数と呼ばれる、1 型ヘルパー T 細胞の細胞傷害性遺伝子発現シグネチャは、クローン増殖した腫瘍濃縮 T 細胞クローンの存在を捕捉し、コンセンサス分子サブタイプやマイクロサテライト不安定性分類などの従来の予後分子バイオマーカーを上回りました。。予想よりも少ないネオアンチゲンの数として定義される遺伝子免疫編集の定量化により、その予後値がさらに洗練されました。私たちは、ルミノコッカスブロミーによって引き起こされ、良好な結果に関連するマイクロバイオームの特徴を特定しました。マイクロバイオームの特徴と免疫学的拒絶反応定数を組み合わせることで、優れた生存確率を持つ患者グループを特定する複合スコア (mICRoScore) を開発し、検証しました。公開されているマルチオミクスデータセットは、結腸がんの生物学をより深く理解するためのリソースを提供し、個別化された治療アプローチの発見を促進する可能性があります。

原発性結腸がんのバイオマーカーに関してはかなりの量の研究が行われているが、米国および欧州の現在の臨床ガイドライン（全米包括的がんネットワークおよび欧州臨床腫瘍学会のガイドラインを含む）は、腫瘍のリンパ節転移のみに依存している。標準的な臨床病理学的変数に加えて、病期分類と DNA ミスマッチ修復 (MMR) 欠損またはマイクロサテライト不安定性 (MSI) の検出により、推奨される治療法を決定します 1,2。 MSI は、MMR 遺伝子の体細胞または生殖系列欠損によって引き起こされ、体細胞突然変異、免疫認識を引き起こすネオアンチゲン、および高密度の腫瘍浸潤リンパ球の蓄積を引き起こします3。

たとえば、T 細胞の密度と空間分布 (イムノスコア) の評価によって捕捉される in situ 適応免疫反応の強さは、MSI 状態 4、 5.

しかし、結腸がんにおけるイムノスコアおよびその他の免疫関連パラメーターの予後効果に関する圧倒的な証拠6,7にもかかわらず、がんゲノムアトラス (TCGA) では遺伝子発現に基づく免疫応答の推定値と患者の生存との間に関連性が欠如しています。結腸腺癌（COAD）コホートは研究コミュニティによって注目されています8、9、10。 TCGA は、そのゲノムデータの豊富さとキュレーションにより、オミクス解析のための優れたデータセットを代表します。しかし、生存転帰を含む包括的な臨床データの収集は、TCGA の主な目的でもなければ、その世界的な範囲と時間の制約を考慮すると実際的な可能性でもありませんでした 11。そのため、TCGA-COAD およびその他の TCGA データセットに関連する患者追跡データが限られているため、統計的に厳密な生存分析が妨げられてきました 11。さらに、TCGA には、T 細胞受容体 (TCR) レパトア分析やマイクロバイオームの特性評価のための専用のアッセイは含まれていませんでした。これらのアッセイは、後にバルク DNA および RNA シーケンス (RNA-seq) データを使用して実行され、少数の健康な固形組織 (たとえば、健康な結腸) のみが含まれていました。）サンプル12、13。さらに、TCGA は当初、がん細胞で発生するゲノムおよび分子の変化をカタログ化することに焦点を当てていたため、厳しい腫瘍純度カットオフに基づくサンプル包含基準が課され 14、免疫の少ない、または間質が豊富な腫瘍標本に集団が偏る可能性がありました。

0.1% in the tumor, which are at least 32 times higher in the tumor compared to normal) are highlighted. i, Correlation of proportion of tumor-enriched T cell clones in the tumor (in percent) with ICR score. Pearson’s r and P value of the correlation are indicated in the plot. All P values are two-sided./p>12 per Mb. Overall P value is calculated by log-rank test. c, Scatter-plot of ICR score by genetic immunoediting (GIE) value for ICR-high and ICR-low samples. Number of samples in each quadrant is indicated in the graph. Gray area delineates ICR scores from 5–9. d, Kaplan–Meier for OS by IES. Censor points are indicated by vertical lines and corresponding table of number of patients at risk in each group is included below the Kaplan–Meier plot. Overall P value is calculated by log-rank test. e, Violin plot of IES by productive TCR clonality (immunoSEQ) (left) and MiXCR-derived TCR clonality (right). Spearman correlation statistics are indicated above each plot. Significance within ICR low and high is indicated. Center line, box limits and whiskers represent the median, interquartile range and 1.5× interquartile range, respectively. P values are two-sided, n reflects the independent number of samples./p> 2) (Fig. 5c and annotated in Supplementary Table 5). No major difference in α diversity (the variety and abundance of species within an individual sample) was observed between tumor and healthy samples (Extended Data Fig. 7b) and only a modestly reduced microbial diversity was observed in ICR-high versus ICR-low tumors (Extended Data Fig. 7b). Selenomonas and Selenomonas 3 were the taxa most significantly increased in ICR-high versus -low tumors (Fig. 5e, Extended Data Fig. 7c and Supplementary Table 6). In terms of survival analysis, the highest number of nominally significant associations was obtained using tumor data (rather than healthy colon data) and OS as the end point (Extended Data Fig. 7d and Supplementary Table 7)./p>

20-fold coverage of at least 99% of targeted exons and >70-fold in at least 81% targeted exons. In healthy samples, sequencing achieved >20-fold coverage of at least 94% of targeted exons and >30-fold in at least 84% targeted exons. Adaptor trimming was performed using the tool trimadap (v.0.1.3). ConPair was run to evaluate concordance and estimate contamination between matched tumor–normal pairs. In eight of the pairs a mismatch was detected and for five pairs, a potential contamination was indicated. HLA typing data were used to validate these results. All potential mismatches and contaminations were excluded, retaining 281 patients for data analysis./p>

2 µg) and sample selection was exclusively based on DNA availability. TCR sequencing was performed using extracted DNA of 114 primary tissue samples and ten matched healthy colon tissues with sufficient DNA available./p>0.1% were defined as tumor-enriched sequences, as previously implemented by Beausang et al.75. The fraction of tumor-enriched TCR sequences in the tumor was calculated by dividing the number of productive templates of tumor-enriched sequences by the total number of productive templates per tumor sample. Pearson’s correlation coefficient between the fraction tumor-enriched TCR sequences and ICR score was calculated./p>1% in the general population. After these technical exclusion criteria, biological filters were applied, including selection of nonsynonymous mutations (frame shift deletions, frame shift insertions, inframe deletions, inframe insertions, missense mutations, nonsense mutations, nonstop mutations, splice site and translation start site mutations). The resulting number of variants/mutations per Mb (capture size is 40 Mb) per sample is referred to as the nonsynonymous TMB. Next, to identify most frequently mutated genes in our cohort that might play a role in cancer, we excluded variants that are predicted to be tolerated according to SIFT annotation or benign according to PolyPhen (polymorphism phenotyping). Finally, all artifact genes, which are typically encountered as bystander mutations in cancer that are mutated for example as a consequence of a high homology of sequences in the gene, were excluded76. The OncoPlot function from ComplexHeatmap (v.2.1.2) was used to visualize the most frequent somatic mutations./p>5% of the tumor samples) with frequencies detected in previously published datasets containing colon cancer samples (TCGA-COAD and NHS-HPFS) as well as reported cancer driver genes32 or colon oncogenic mediators38. First, we extracted genes with a nonsynonymous mutation frequency >5% in the AC-ICAM cohort. Subsequently, only genes that are likely involved in cancer development, as described in the section ‘Cancer-related gene annotation’, were retained. All artifact genes (mutations typically encountered as bystander mutations in cancer that are mutated for example as a consequence of a high homology of sequences in the gene), were excluded. Genes that have previously been reported as colon cancer oncogenic mediator38 or cancer driver gene for colorectal cancer (COADREAD)32 were also excluded. Finally, only genes with a mutation frequency <5% in the NHS-HPFS colon cancer cohort37 and <5% in TCGA-COAD36 were maintained. As a final filter, only genes that had a nonsynonymous mutation frequency of at least twofold in AC-ICAM compared to TCGA-COAD were labeled as potentially new in colon cancer./p> 0.4) or MSS (MANTIS score ≤ 0.4)./p> 500 nM, were used as criteria to infer neoantigens. Predicted neoantigens were used to calculate the GIE value. We calculated the GIE value by taking the ratio between the number of observed versus the number of expected neoantigens. The expected number of neoantigens was based on the assumption of a linearity between TMB and the number of neoantigens. We therefore assumed that samples that have a lower frequency of neoantigens than expected (lower GIE values), display evidence of immunoediting. A higher frequency of neoantigens than expected indicates a lack of immunoediting, see calculations section for details./p>60× coverage per sample. The median (across samples) of the average target coverage (per sample) was 76× (range of 50–92)./p> ±0.3. Clusters among the networks (groups of at least three correlated genera using the cutoffs specified above) were defined via a fast greedy clustering algorithm. All co-occurrence networks were made using the R package ‘NetCoMI (v.1.1.0) – Network Construction and Comparison for Microbiome Data’84 and visualized using Cytoscape (v.3.9.1)./p>0) and ‘low-risk’ (<0) groups as performed in the training set. Therefore, no cutoff optimization occurred in the validation phase./p>2 μg). Securing additional funds allowed us to perform WGS and 16S rRNA sequencing and to expand the WES and TCR analyses to any sample with sufficient DNA available. No specific power calculation was performed at that time and the targeted sample size was based on the estimated number of samples that could be retrieved from LUMC (n = 400), which compared favorably with the sample size of similar studies in the field./p>90% to detect a 10% mutational frequency in 90% of genes86./p>80% for an HR of 0.5 with a two-sided α of 0.05. With 154 OS events in the whole cohort, our study has a power of 90% for an HR of 0.59 (assuming two group of equal size c) and a power of 90% for an HR of 0.57 (assuming groups with unequal sample size, 2:1) with a two-sided α of 0.05./p>

0.1% in the tumor, that are at least 32 times more abundant in the tumor compared to the normal./p>12/Mb) versus Low (<12/Mb) TMB. b, Same as a, but only including ICR Medium. c, Kaplan–Meier curves for OS by GIE status. d, Same as c in ICR Medium patients. Overall P value is calculated by log-rank test and P value corresponding to HR is calculated using cox proportional hazard regression (a-d). e, Stacked bar charts of mutational load category (top) and MSI status (bottom) per IES. f, Kaplan–Meier curves for OS (left) and PFS (right) stratified by AJCC pathological stage (I, II, III) within IES4. Stratification was not performed for stage IV due to the limited number (n = 2). g, Stacked bar chart of distribution of AJCC Pathological Tumor Stage by IES. h, Multivariate cox proportional hazards model for OS including IES (ordinal, IES1, IES2, IES3, IES4) and AJCC Pathological Tumor Stage (ordinal, Stage I, II, III, IV). P values corresponding to HR calculated by cox proportional hazard regression analysis are indicated. i, Violin plot represents TCR clonality as determined by MiXCR in ICR Medium samples. Center line, box limits, and whiskers represent the median, interquartile range and 1.5x interquartile range respectively. P value calculated by unpaired, two-sided t-test. j, Results of the multiple linear regression model showing the respective contributions of productive TCR clonality (X1) and (X2) for prediction of IES (Y). Corresponding significance of the effects are indicated in the scatter-plots (left). k, Local Polynomial Regression Fitting of productive TCR clonality by IES (ordinal variable). The gray band reflects the 95% confidence interval for predictions of the local polynomial regression model. All P values are two-sided; n reflects the independent number of samples in all panels. Overall Survival (OS). Tumor Mutational Burden (TMB). Genetic Immunoediting (GIE). ImmunoEditing Score (IES)./p> 0). d, Concordance index of optimal multivariate cox regression model per dataset. The cross-validation performance highlights the mean concordance of 10-different folds with the optimal hyper parameters (gamma and lambda) that is, the same parameters as the optimal model. e, Forest plot with HR (center), corresponding 95% confidence intervals (error bars), and P value calculated by cox proportional hazard regression analysis for OS, using: 1) the 16 S MBR score in AC-ICAM, 2) WGS R. bromii abundance 3) PCR-based R. bromii abundance, 4) 16 S Ruminococcus 2 relative abundance and 5) MBR score calculated using WGS data. f, Heat map of Spearman correlation between the relative abundance of the MBR classifier taxa in tumor samples and immune traits. Only correlations with an FDR > 0.1 are visualized. An additional row is added for Ruminococcus 2 showing all correlations, unfiltered for FDR. * The taxonomical order is indicated between brackets, as family was unassigned. g, Kaplan–Meier curve for PFS in AC-ICAM, with all patients stratified by mICRoScore High vs Low. HR and P value are calculated using cox proportional regression. h, AJCC pathological stage within the mICRoScore High group in AC-ICAM and within TCGA-COAD i, Kaplan–Meier curve for PFS in AC-ICAM, with all patients with ICR High stratified by mICRoScore. Overall P value is calculated by log-rank test and P value corresponding to HR is calculated using cox proportional hazard regression. Overall Survival (OS), Progression-Free Survival (PFS). All P values are two-sided; n reflects the independent number of samples in all panels./p>