基于膜雌激素受体结合能力的分类预测模型构建与验证
CSTR:
作者:
作者单位:

1华北理工大学 基础医学院 河北省慢性疾病基础医学重点实验室,河北 唐山 063210;2国家食品 安全风险评估中心,北京 100022;3中国科学院生态环境研究中心,北京 100085;4资源昆虫 高效养殖与利用全国重点实验室,中国农业科学院蜜蜂研究所,北京 100093;5防城港市 中医医院,广西 防城港 538021

作者简介:

高萌萌 女 在读研究生 研究方向为基础医学 E-mail: gaomm@stu.ncst.edu.cn

通讯作者:

李淑英 女 教授 研究方向为病毒相关疾病致病机制 E-mail: lsy5001@sina.com
曹佩 女 副研究员 研究方向为食品安全风险评估 E-mail: caopei@cfsa.net.cn

中图分类号:

R155

基金项目:

国家重点研发计划(2023YFF1103803);广西科技计划项目(桂科AB24263001)


A classification model based on G protein-coupled estrogen receptor binding affinity: development and validation
Author:
Affiliation:

1Hebei Key Laboratory of Basic Medicine for Chronic Diseases, School of Basic Medicine, North China University of Science and Technology, Hebei Tangshan 063210, China;2China National Center for Food Safety Risk Assessment, Beijing 100022, China;3Chinese Academy of Sciences, Beijing 100085, China;4State Key Laboratory of Resource Insects, Institute of Apicultural Research, Chinese Academy of Agricultural Sciences Beijing 100093, China;5Fangchenggang Hospital of Traditional Chinese Medicine, Guangxi Fangchenggang 538021, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    目的 利用机器学习算法构建基于膜雌激素受体(GPER)结合能力的二分类预测模型,用于高效、准确地预测内分泌干扰物(EDCs)与GPER结合能力。方法 收集224个化合物与GPER的结合数据,基于分子描述符合MACCS指纹,利用随机森林算法(RF)、人工神经网络(ANN-BP)、极限梯度提升(XGBoost)、支持向量机(SVM)、K-近邻(k-NN)和线性判别分析(LDA)6种机器学习算法构建二分类预测模型。结果 使用MACCS指纹建立的RF、SVM、ANN-BP、k-NN和XGBoost模型在十折交叉验证中的准确率均高于90%,AUC高于92%,RF模型在测试集上准确性为85%。SHAP图分析表明,分子中至少含1个带氢的氧原子、8元环或更大的环状结构和环内叔碳(或更高连接度)中心,利于其与GPER结合。结论 本研究构建的机器学习模型为食品中雌激素类内分泌干扰物的高通量筛查提供了可靠工具,识别的关键分子特征深化了对EDCs与雌激素受体相互作用机制的理解,可为新污染物的靶向监测及风险预警提供理论依据。

    Abstract:

    Objective To develop an accurate and efficient binary classification model for predicting the binding capacity of endocrine-disrupting chemicals (EDCs) to the G protein-coupled estrogen receptor (GPER).Methods GPER binding data for 224 compounds were collected. Based on molecular descriptors and Molecular ACCess System fingerprints (MACCS), six machine learning algorithms including random forest (RF), artificial neural network-back propagation (ANN-BP), extreme gradient boosting (XGBoost), support vector machine (SVM), k-nearest neighbors (k-NN), and linear discriminant analysis (LDA) were employed to construct binary prediction models.Results RF, SVM, ANN-BP, k-NN and XGBoost models built with MACCS fingerprints achieved accuracies >90% and Areas of under curve (AUC) values >92% in 10-fold cross-validation, while the RF model reached 85% accuracy on the external test set. SHapley Additive exPlanations (SHAP) analysis indicated that molecules containing at least one hydrogen-bearing oxygen atom, an 8-membered or larger ring system, and a tertiary (or higher) carbon center within the ring are favorable for GPER binding.Conclusion Based on structural representation and model performance evaluation, the RF classification model built upon MACCS fingerprints was identified as the optimal model. SHAP analysis revealed that molecules containing at least one hydrogen-bearing oxygen atom, 8-membered rings or larger cyclic structures, and tertiary carbon (or higher connectivity) centers within rings may mimic certain structural features of estrogen, thereby facilitating interaction with estrogen receptors. These findings provide a theoretical foundation for the endocrine disruption mechanisms of EDCs and offer a methodological tool and theoretical basis for the targeted screening of emerging contaminants in food.

    参考文献
    相似文献
    引证文献
引用本文

高萌萌,张磊,吴难,李娜,潘飞,陈荣记,李淑英,曹佩.基于膜雌激素受体结合能力的分类预测模型构建与验证[J].中国食品卫生杂志,2025,37(11):1022-1033.

复制
分享
相关视频

文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2025-10-18
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2026-04-02
  • 出版日期:
文章二维码
严正声明
关闭