Machine learning-based risk factor analysis of adverse birth outcomes in very low birth weight infants

Participants and variables

Data consisted of 10,423 VLBW infants from the Korean Neonatal Network (KNN) database during January 2013-December 2017. The KNN started on April 2013 as a national prospective cohort registry of VLBW infants admitted or transferred to neonatal intensive care units across South Korea (It covers 74 neonatal intensive care units now). It collects the perinatal and neonatal data of VLBW infants based on a standardized operating procedure37.

Five adverse birth outcomes were considered as binary dependent variables (no, yes), ie, gestational age less than 28 weeks (GA< 28), GA less than 26 weeks (GA< 26), birth weight less than 1000 g (BW< 1000), BW less than 750 g (BW< 750) and SGA. Thirty-three predictors were included: sex—male (no, yes), birth-year (2013, 2014, 2015, 2016, 2017), birth-month (1, 2, …, 12), birth-season-spring ( no, yes), birth-season-summer (no, yes), birth-season-autumn (no, yes), birth-season-winter (no, yes), number of fetuses (1, 2, 3, 4 or more), in vitro fertilization (no, yes), gestational diabetes mellitus (no, yes), overt diabetes mellitus (no, yes), pregnancy-induced hypertension (no, yes), chronic hypertension (no, yes), chorioamnionitis ( no, yes), prelabor rupture of membranes (no, yes), prelabor rupture of membranes > 18 h (no, yes), antenatal steroid (no, yes), cesarean section (no, yes), oligohydramnios (no, yes) , polyhydramnios (no, yes), maternal age (years), primipara (no, yes), maternal education (elementary, junior high, senior high, college or higher), maternal citizenship (Korea, Vietnam, China, Philippines, Japan, Cambodia, United States, Thailand, Mongolia, Other), paternal education (elementary, junior high, senior high, college or higher), paternal citizenship (Korea, Vietnam, China, Philippines, Japan, Cambodia, United States, Thailand, Mongolia, Other), unmarried (no, yes), congenital infection (no, yes), PM10 year (PM10 for each year), PM10 month (PM10 for each birth-month), temperature average (for each year), temperature min (for each year) and temperature max (for each year). PM10 and temperature data came from the Korea Meteorological Administration (PM10https://data.kma.go.kr/data/climate/selectDustRltmList.do?pgmNo=68; temperature https://web.kma.go.kr/weather/climate/past_cal.jsp). The definition of each variable is given in Text S1, supplementary text.

Statistical analysis

The artificial neural network, the decision tree, the logistic regression, the Naïve Bayes, the random forest and the support vector machine were used for predicting preterm birth38,39,40,41,42,43. A decision tree includes three elements, ie, a test on an independent variable (intermediate note), an outcome of the test (branch) and a value of the dependent variable (terminal node). A naïve Bayesian classifier performs classification on the basis of Bayes’ theorem. Here, the theorem states that the probability of the dependent variable given certain values ​​of independent variables can be calculated based on the probabilities of the independent variables given a certain value of the dependent variable. A random forest is a collection of many decision trees, which make majority votes on the dependent variable (“bootstrap aggregation”). Let us take a random forest with 1000 decision trees as an example. Let us assume that the original data includes 10,000 participants. Then, the training and testing of this random forest takes two steps. Firstly, new data with 10,000 participants is created based on random sampling with replacement, and a decision tree is created based on this new data. Here, some participants in the original data would be excluded from the new data and these leftovers are called out-of-bag data. This process is repeated 1000 times, ie, 1000 new data are created, 1000 decision trees are created and 1000 out-of-bag data are created. Secondly, the 1000 decision trees make predictions on the dependent variable of every participant in the out-of-bag data, their majority vote is taken as their final prediction on this participant, and the out-of-bag error is calculated as the proportion of wrong votes on all participants in the out-of-bag data38,39.

A support vector machine estimates a group of “support vectors”, that is, a line or space called “hyperplane”. The hyperplane separates data with the greatest gap between various sub-groups. An artificial neural network consists of “neurons”, information units combined through weights. In general, the artificial neural network includes one input layer, one, two or three intermediate layers and one output layer. Neurons in a previous layer link with “weights” in the next layer (Here, these weights denote the strengths of linkages between neurons in a previous layer and their next-layer counterparts). This “feedforward” operation begins from the input layer, runs through intermediate layers and ends in the output layer. Then, this process is followed by learning: These weights are updated according to their contributions for a gap between the actual and predicted final outputs. This “backpropagation” operation begins from the output layer, runs through intermediate layers and ends in the input layer. The two processes are repeated until the performance measure reaches a certain limit38,39. Data on 10,423 observations with full information were divided into training and validation sets with a 70:30 ratio (7296 vs. 3127). Accuracy, a ratio of correct predictions among 3127 observations, was employed as a standard for validating the models. Random forest variable importance, the contribution of a certain variable for the performance (GINI) of the random forest, was used for examining major predictors of adverse birth outcomes in VLBW infants including PM10. The random split and analysis were repeated 50 times then its average was taken for external validation44,45. R-Studio 1.3.959 (R-Studio Inc.: Boston, United States) was employed for the analysis during August 1, 2021–September 30, 2021.

Ethics statement

The KNN registry was approved by the institutional review board (IRB) at each participating hospital (IRB No. of Korea University Anam Hospital: 2013AN0115). Informed consent was obtained from the parent(s) of each infant registered in the KNN. All methods were carried out in accordance with the IRB-approved protocol and in compliance with relevant guidelines and regulations.

The names of the institutional review board of the KNN participating hospitals were as follows: The institutional review board of Gachon University Gil Medical Center, The Catholic University of Korea Bucheon ST. Mary’s Hospital, The Catholic University of Korea Seoul ST. Mary’s Hospital, The Catholic University of Korea ST. Vincent’s Hospital, The Catholic University of Korea Yeouido ST. Mary’s Hospital, The Catholic University of Korea Uijeongbu ST. Mary’s Hospital, Gangnam Severance Hospital, Kyung Hee University Hospital at Gangdong, GangNeung Asan Hospital, Kangbuk Samsung Hospital, Kangwon National University Hospital, Konkuk University Medical Center, Konyang University Hospital, Kyungpook National University Hospital, Gyeongsang National University Hospital, Kyung Hee University Medical center, Keimyung University Dongsan Medical Center, Korea University Guro Hospital, Korea University Ansan Hospital, Korea University Anam Hospital, Kosin University Gospel Hospital, National Health Insurance Service Iilsan Hospital, Daegu Catholic University Medical Center, Dongguk University Ilsan Hospital, Dong-A University Hospital, Seoul Metropolitan Government-Seoul National University Boramae Medical Center, Pusan ​​National University Hospital, Busan ST. Mary’s Hospital, Seoul National University Bundang Hospital, Samsung Medical Center, Samsung Changwon Medical Center, Seoul National University Hospital, Asan Medical Center, Sungae Hospital, Severance Hospital, Soonchunhyang University Hospital Bucheon, Soonchunhyang University Hospital Seoul, Soonchunhyang University Hospital Cheonan, Ajou University Hospital, Pusan ​​National University Children’s Hospital, Yeungnam University Hospital, Ulsan University Hospital, Wonkwang University School of Medicine & Hospital, Wonju Severance Christian Hospital, Eulji University Hospital, Eulji General Hospital, Ewha Womans University Medical.

Center, Inje University Busan Paik Hospital, Inje University Sanggye Paik Hospital, Inje University Ilsan Paik Hospital, Inje University Haeundae Paik Hospital, Inha University Hospital, Chonnam National University Hospital, Chonbuk National University Hospital, Cheil General Hospital & Women’s Healthcare Center, Jeju National University Hospital, Chosun University Hospital, Chung-Ang University Hospital, CHA Gangnam Medical Center, CHA University, CHA Bundang Medical Center, CHA University, Chungnam National University Hospital, Chungbuk National University, Kyungpook National University Chilgok Hospital, Kangnam Sacred Heart Hospital, Kangdong Sacred Heart Hospital, Hanyang University Guri Hospital, and Hanyang University Medical Center.

Leave a Reply

Your email address will not be published.

Back to top button