基于XGBoost方法的常见人体农兽药及化学污染物暴露与糖尿病的相关性研究
作者:
作者单位:

1.北京大学公共卫生学院生物统计系,北京 100041;2.中国疾病预防控制中心营养与健康所/国家卫生健康委微量元素与营养重点实验室,北京 100050;3.北京大学公共卫生学院生物统计系/北京大学临床 研究所,北京 100041

作者简介:

卢宇红 女 硕士研究生 研究方向为生物统计 E-mail:2719836467@qq.com

通讯作者:

苏畅 男 研究员 研究方向为营养与食品卫生 E-mail:suchang@ninh.chinacdc.cn

中图分类号:

R155

基金项目:

国家重点研发计划(2019YFC1605100);国家自然科学基金(81573155,82173615)


Association analysis between common pesticide and veterinary drug exposure in humans and diabetes mellitus based on XGBoost
Author:
Affiliation:

1.Department of Biostatistics, Peking University, Beijing 100041, China;2.National Institute for Nutrition and Health, Chinese Center for Disease Control and Prevention, Key Laboratory of Trace Element Nutrition of National Health Commission, Beijing 100050, China;3.Department of Biostatistics, Peking University, Peking University Clinical Research Center, Beijing 100041, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    目的 基于Lasso变量筛选和XGBoost模型探讨人体农兽药及化学污染物暴露水平与糖尿病患病风险相关性。方法 2018—2019年,在中国石家庄和杭州进行的“降低成年超重者营养相关慢性病风险的适宜身体活动量研究”调查,选取86例糖尿病患者与410例非糖尿病患者样本并提取问卷调查中个人基本情况数据、体格测量、血生化数据和血清农兽药及化学污染物暴露浓度数据。采用Lasso筛选变量,再分别引入逻辑回归模型与XGBoost模型进行拟合,比较AUC评估拟合效果,并进行变量重要性排序。结果 Lasso筛选出2-乙基己基二苯基磷酸酯、全氟辛酸与全氟十一烷酸3种农兽药及化学污染物暴露与人群糖尿病患病有关,重要性排序为全氟辛酸>2-乙基己基二苯基磷酸酯>全氟十一烷酸,XGBoost模型(AUC=0.83)显著优于逻辑回归模型(AUC=0.64)(P<0.05)。结论 Lasso适用于糖尿病患病影响因素的筛选,且XGBoost模型具有较好的复杂数据拟合能力。2-乙基己基二苯基磷酸酯、全氟辛酸与全氟十一烷酸是人群糖尿病患病的重要影响因素。

    Abstract:

    Objective To explore the association between pesticide and veterinary drug exposure and the incidence of diabetes mellitus based on Lasso feature selection and the XGBoost model.Methods A cross-sectional study was conducted in Shijiazhuang and Hangzhou, China, between 2018 and 2019, enrolling 80 participants with diabetes and 410 healthy controls. The basic personal information, physical measurements, blood biochemical data, and serum exposure concentration data concerning agricultural or veterinary drugs and chemical pollutants were extracted using a questionnaire. Lasso was used for screening variables. Logical regression and XGBoost models were introduced for data fitting. The area under the curve (AUC) was compared to evaluate the fitting effect, and the variables were ranked by importance.Results Three features from pesticide and veterinary drug exposure, namely PFOA, PFUdA and EHDPP, were selected as related to diabetes, ranking PFOA> EHDPP> PFUdA. The XGBoost model (AUC = 0.83) performed significantly better than the logistic regression model (AUC = 0.64) in this dataset (P<0.05).Conclusion Lasso is suitable for screening the factors influencing diabetes. The XGBoost model still had a strong ability to fit complex relationships between various influencing factors. PFOA, PFUdA and EHDPP are significant risk factors for diabetes.

    参考文献
    相似文献
    引证文献
引用本文

卢宇红,李孜孜,刘芝霖,苏畅,王惠君,张兵,侯艳.基于XGBoost方法的常见人体农兽药及化学污染物暴露与糖尿病的相关性研究[J].中国食品卫生杂志,2023,35(5):652-657.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2022-05-16
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2023-08-14
  • 出版日期:
文章二维码
《中国食品卫生杂志》邮寄地址与联系方式变更通知
关闭