Skip to main content

Predicting depression and unravelling its heterogeneous influences in middle-aged and older people populations: a machine learning approach

Abstract

Background

Aging has become a global trend, and depression, as an accompanying issue, poses a significant threat to the health of middle-aged and older adults. Existing studies primarily rely on statistical methods such as logistic regression for small-scale data analysis, while research on the application of machine learning in large-scale data remains limited. Therefore, this study employs machine learning methods to explore the risk factors for depression among middle-aged and older adults in China.

Methods

Using a two-step hybrid model combining long short-term memory (LSTM) and machine learning (ML), we compared 20 depression risk/protective factors in a balanced panel dataset of middle-aged and elderly Chinese adults (N = 3706; aged 45–94; 64.65% female; 41.20% middle-aged) from the China Health and Retirement Longitudinal Study (CHARLS). Data were collected across five waves (2011, 2013, 2015, 2018, and 2020). The LSTM model predicted risk factors for the fifth wave via data from the preceding four waves. Five ML models were then used to classify depression (yes/no) based on these factors, which included demographic, lifestyle, health, and socioeconomic variables.

Results

The LSTM model effectively predicted depression-related variables (mean square error = 0.067). The average AUC of the five ML models ranged from 0.78 to 0.82. The key predictive factors were disability, life satisfaction, activities of daily living (ADL) impairment, chronic diseases, and self-reported memory. For the middle-aged group, the top three factors were disability, life satisfaction, and chronic diseases; for the Older people group, they were life satisfaction, chronic diseases, and ADL impairment.

Conclusion

The two-step hybrid model ("LSTM + ML") effectively predicted depression over 2 years via demographic and health data, aiding early diagnosis and intervention.

Peer Review reports

Introduction

Depression is one of the most prevalent mental health issues worldwide, exerting a significant effect on the burden of diseases worldwide [1]. Approximately 311 million people have been diagnosed with depression globally [2], representing an increase of over 18% over the past decade [3, 4]. Depression ranks as the second leading cause of disability worldwide, accounting for 3.0% of global disability-adjusted life years [5]. Given its distinctive physiological characteristics and social implications, the prevalence of depression is highest among middle-aged and Older people individuals aged 55–74 years [6, 7]. Consequently, there is a pressing need for heightened societal attention to the potential risks of depression within this demographic.

Owing to the sizable and rapidly growing Older people population in China, the prevalence of depression is relatively high [8]. Conservatively estimated, the number of depression patients in China has exceeded 54 million, with a depression incidence rate of 22.7% among those aged 60 and above. However, more than 90% of patients fail to receive regular treatment promptly [9]. Depression typically has a severe effect on the psychological and physical health of Older people individuals, compromising their quality of life, increasing mortality rates, and incurring substantial healthcare costs [10]. Therefore, early prediction of depression risk among future Older people individuals has significant implications for hospitals, nursing homes, communities, and families, facilitating the implementation of timely and effective prevention and intervention measures to significantly increase the quality of life of Older people individuals and reduce medical expenses [11].

Currently, research on the prediction of depression among the Older people population has focused primarily on several aspects. First, the diagnosis of depression via biological indicators, although highly accurate, fails to effectively predict future depression risk [12, 13]. Second, cross-sectional statistical data are employed to explore the relationships between depression and risk factors; however, such data cannot accurately predict future depression risk [14]. Third, depression analysis is conducted via semantic recognition and data mining techniques on the basis of information posted by users on social media platforms; however, the applicability of this method is limited, particularly for the Older people population. The utility of these three types of research findings remains constrained, especially concerning early intervention and timely treatment for depression. Hence, the prediction of depression among different Older people populations via demographic characteristics and general health information is highly important.

In the present study, risk factors for depression in Older people individuals can be categorized into three main groups as follows: (a) demographic factors such as age [15], sex [16], marital status [17], and place of residence [18]; (b) health-related risk factors such as self-rated health status [19], activities of daily living (ADL) impairment [20], alcohol consumption [21], smoking [22], sleep duration [23], and social engagement [24]; and (c) risk factors related to chronic diseases such as cardiovascular diseases [25], kidney diseases [26], diabetes [27], arthritis [28], and dementia [29]. Although many studies have analysed the interrelationships among risk factors for depression, many methods, such as the logistic model, have been limited to linear relationships. In reality, these factors are temporally correlated. However, previous research has focused mainly on individual time points, often failing to capture temporal changes, potentially failing to fully reflect the dynamic and stable effects of these features on depression. Machine learning and deep learning methods are iterative and can analyse the nonlinear and high-dimensional correlations among risk factors simultaneously, capturing the temporal relationships among them [30]. While some studies have used long short-term memory (LSTM) models to capture the temporal changes in predictor variables [31, 32], their classification models are still limited by conventional machine learning models, which have relatively weak capabilities in handling nonlinear and high-dimensional correlations, resulting in limited overall performance. Therefore, we introduce a deep learning classification method, a convolutional neural network (CNN). CNN, through convolution operations and hierarchical feature extraction, can better capture the complex relationships among risk factors for depression, thereby improving predictive performance [33].

To the best of our knowledge, longitudinal studies on the prediction of depression in different age groups over the coming years are relatively scarce. Research has focused primarily on the Older people population [31, 32, 34], with comparatively less attention given to middle-aged individuals, thus resulting in a dearth of exploration into the differences in risk/protective factors for depression prediction between middle-aged and Older people populations. Middle-aged and Older people individuals exhibit significant differences in life stages, physical health status, and social relationships. Middle-aged individuals often experience psychological adaptation issues and anxiety when facing life changes such as retirement [35] and when children become independent [36], whereas Older people individuals are more susceptible to the effects of physical health problems, social isolation, and widowhood [37]. Additionally, middle-aged individuals may prioritize quality of life and identity [38], whereas Older people individuals may place more emphasis on physical health and family ties [39]. These differences may influence the incidence and manifestation of depression, leading to misjudgement of depression risk and inappropriate intervention measures. Therefore, there is an urgent need for a comparative analysis of depression risk factors in these two age groups through longitudinal prediction.

Currently, there is limited research on estimating depression among family-based Older people populations. However, owing to the prevailing Confucian ethic of"filial piety"in China, family-based care will remain the primary choice for Older people individuals in the future [40]. Thus, this study utilizes a large-scale representative database of Older people families in China as a sample to investigate the following issues: (1) Utilizing long short-term memory (LSTM) models to capture time series information and predict the levels of different depression risk factors in middle-aged and Older people populations over the next two years; (2) comparing the performance of five machine learning algorithms in predicting depression episodes among Older people individuals to identify the most effective one for subsequent risk factor interpretation; (3) identifying which features are risk/protective factors for depression in middle-aged and Older people populations via longitudinal data from a large community-based dataset in China, including demographic, lifestyle, and health status data; and (4) investigating differences in depression risk/protective factors between middle-aged (ages 45–60) and Older people (aged 60 +) individuals.

Materials and methods

Data description

This study utilized data from the China Health and Retirement Longitudinal Study (CHARLS) for analysis. CHARLS is a large-scale interdisciplinary survey project representative of China [41]. The CHARLS study obtained ethical approval from Peking University’s Biomedical Ethics Committee (IRB 00001052–11015), and all participants provided informed consent before data collection. For each wave of data, we conducted Little’s MCAR test on all variables, confirming that the missing data followed a Missing Completely at Random (MCAR) pattern. Given this, missing values could be safely deleted or imputed [23]. We opted to exclude samples with missing values in either independent or dependent variables from the 2011, 2013, 2015, 2018, and 2020 waves. As shown in Fig. 1, after removing cases with missing data and attrition, we obtained a balanced panel dataset comprising 3,706 middle-aged and older adults, ranging in age from 45 to 94 years, with 64.65% being female. Of the total sample, middle-aged individuals (45–60 years) accounted for 41.20%, while older adults (60 + years) constituted 58.80%. When processing the data, we did not exclude participants with depression in any of the four waves from 2011–2018. While excluding baseline depression cases might aid in model interpretation, the dynamic nature of depression over time is a crucial aspect of our study. Moreover, in real-world scenarios, individuals with depression are part of the population and should be included in analyses [31, 34].

Fig. 1
figure 1

A flow chart for study population selection (a balanced five-wave panel was constructed without intentionally excluding participants with depression in 2011, 2013, 2015, and 2018)

Outcome variables

In this study, the Chinese version of the Center for Epidemiologic Studies Depression Scale- 10 (CES-D10) was utilized to assess participants'levels of depression [42]. The Chinese version of the CES-D10 is a commonly used 10-item scale in epidemiological research. Consistent with previous studies [32, 43], participants scoring 11 or higher on the CES-D10 were classified as having depression [44].

Predictor variables

In our study, owing to the varying structure of the questionnaire variables across different waves, we opted for the same predictor variables as those used in Waves 1–5. Ultimately, following similar studies in the literature [31, 34, 45], we selected 18 variables (grouped into three categories) as predictor factors. The detailed information on these variables is as follows:

(a) Demographic variables included age, sex, rural/urban community, and marital status. Age and sex were considered auxiliary inputs, as they are variables that do not need to be predicted over time. Marital status was categorized as married (married/cohabiting) or unmarried (never married/divorced/separated and widowed) based on the 2011 China Health Statistics Yearbook.

(b) Socioeconomic variables included education level, occupational status, and medical insurance (yes/no). Education level was divided into low (primary school and below) and high (junior high school and above). Occupational status was classified into agricultural work, nonagricultural work, and retired/not working.

(c) Lifestyle and health-related variables included life satisfaction, self-rated health status, social activities in the past month, self-rated memory, smoking, drinking, medical service experience in the past month, sleep duration, activities of daily living (ADL) difficulties, chronic diseases, and disability as predictor factors. Consistent with prior research, ADL impairment was measured by asking participants if they encountered difficulties in activities such as bathing, dressing, eating, getting in and out of bed, using the toilet, engaging in bowel movements, performing housework, making phone calls, taking medications, shopping, and managing finances over the past three months, with total scores ranging from 20–80. Social activity experiences (in the past month) encompassed interactions with friends, community recreation, volunteering or charity work, training, and stock fund investments, among others. Chronic diseases were defined on the basis of whether the participants were diagnosed with cardiovascular diseases, kidney diseases, liver diseases, digestive system diseases, asthma, arthritis, and mental and memory disorders, among other nine chronic illnesses. Medical service experience referred to whether participants had visited hospitals, clinics, or outpatient departments in the past month.

Statistical analysis

Model and evaluation

This study's depression prediction task is based on balanced panel data and involves two main steps (Fig. 2). Given that our analysis involves both longitudinal (LSTM) and cross-sectional (machine learning) models, ensuring data consistency across waves is crucial. LSTM requires complete temporal sequences to effectively learn time-dependent patterns, whereas traditional ML models such as RF, logistic regression, and XGBoost are designed for independent observations rather than sequential data. To maintain model comparability and ensure valid inference, we opted to construct a balanced panel dataset. Additionally, Little’s MCAR test confirmed that missing data followed a completely random pattern (MCAR), allowing for safe deletion or imputation of missing values. Based on this, we excluded samples with missing values across all waves (2011, 2013, 2015, 2018, and 2020) to ensure a consistent dataset for both deep learning and machine learning models.

Fig. 2
figure 2

The architecture of the hybrid model for the estimation of depression. The raw dataset used Waves 1–4 depression data, and the output predicted data of Wave 4 and the outcome of Wave 5 were constructed into a new dataset. MSE: mean squared error; ML: machine learning; LR; logistic regression; XGBOOST: extreme gradient boosting; RF: random forest; SVM: support vector machine; CNN: convolutional neural network

In the first stage, we use the first four waves of data as the original dataset and employ an LSTM algorithm to predict the feature values for 2020. Specifically, we use waves 1–3 as features and the depression status from wave 4 as the label to train and test the LSTM model. Using the trained model, we predict the feature variables for the fifth wave (2020) on the basis of the first four waves of data.

In the second stage, we employ five common machine learning (ML) methods to investigate whether the features predicted from waves 1–4 can accurately predict the depression status in wave 5 (2020). Specifically, in this stage, we combine the LSTM-predicted feature variables with age, gender (time-invariant variables), and depression status from wave 5 to form a new dataset. Using the depression status from wave 5 as the label, we perform tenfold cross-validation to tune the hyperparameters of the five ML methods and empirically analyse the classification performance of these methods in predicting depression.

For each analysis, the entire dataset was randomly split into a training set (80% of the total sample) and a test set (20%). Following standard machine learning protocols, we conducted tenfold cross-validation and hyperparameter tuning on the training data. Details of the LSTM hyperparameter tuning can be found in Supplementary Table 1, while the hyperparameter tuning for the five machine learning models is presented in Supplementary Table 2. Model performance was assessed via accuracy, sensitivity, positive predictive value (PPV), and area under the ROC curve (AUROC) metrics. Additionally, the Brier score was selected as a calibration metric. The development and evaluation of machine learning methods were carried out via Python 3.8, which incorporates the PyTorch and Scikit-learn packages. Feature importance was determined via the SHAP package to calculate Shapley values, which were visualized through bee swarm plots and bar charts.

LSTM

Traditional machine learning models are typically designed to handle static data, where each sample is assumed to be independently and identically distributed, without considering temporal dependencies. As a result, these models often struggle with processing time-series data. Long Short-Term Memory (LSTM) networks address this limitation through memory cells and a gating mechanism, including the input gate, forget gate, and output gate, which enable the model to store long-term information while filtering out irrelevant details. In our study, LSTM is used to capture how chronic diseases, SAPH, and LS evolve over time, identifying patterns in their longitudinal interactions. The forget gate determines which information should be discarded, the input gate regulates the incorporation of new information, and the output gate generates the current hidden state. This structure allows LSTM to effectively capture dynamic temporal relationships, mitigate the vanishing gradient problem, and achieve superior performance in time-series forecasting tasks. [46].

Logistic regression

Logistic regression (LR) is a common statistical learning method used for handling binary classification problems. LR models the log-odds function to map the linear combination of input features to probability values between [0, 1]. It is characterized by its simplicity, intuitiveness, and computational efficiency, but it has a limited ability to model nonlinear relationships in data.

Random forest

RF is a robust ensemble learning algorithm that makes predictions based on a collection of decision trees. Each decision tree is constructed by randomly selecting features and data samples and then combining the predictions of each tree through voting or averaging. This ensemble approach helps reduce overfitting and demonstrates good adaptability to high-dimensional data and datasets with complex interactions. Unlike LSTM, which captures temporal dependencies, RF is well-suited for analyzing non-sequential structured data and identifying key predictors through feature importance analysis. RF performs well in both classification and regression problems and exhibits efficiency and scalability when dealing with large datasets [47].

XGBOOST

XGBoost is a variant of the gradient-boosting decision tree (GBDT), which leverages gradient-boosting techniques to optimize model performance during training. Compared with traditional gradient boosting algorithms, XGBoost introduces regularization terms to prevent overfitting and employs more efficient approximation algorithms to increase training speed [47].

Convolutional neural network

A convolutional neural network (CNN) is a type of deep learning model that extracts features via components such as convolutional layers, pooling layers, and fully connected layers, and multiple such components are stacked to construct a deep network. CNNs can automatically learn features and perform well when handling large-scale datasets [33].

Results

Univariate correlation analysis of predictors of depression

Table 1 displays the results of univariate correlation analysis between predictive factors in 2018 and depression in 2020. The cross-sectional data from the 2020 China Health and Retirement Longitudinal Study included a total of 1527 (41.2%) middle-aged individuals and 2179 (58.8%) Older people individuals, comprising a total of 3706 participants. Among them, 1359 (18.8%) were diagnosed with depression. The prevalence of depression was significantly greater in females (42%) than in males (28%). In 2018, predictive factors such as rural/urban community status, marital status, education level, occupational status, medical insurance, life satisfaction, self-reported health status, social activities, smoking, alcohol consumption, self-rated memory, medical services, sleep duration, ADL impairment, chronic diseases, and disabilities exhibited significant between-group differences in the occurrence of depression in 2020.

Table 1 Predictors in 2018 and univariate analysis of their associations with depression in 2020

Prediction ability of ML models for middle-aged and Older people individuals with depression

The predictive performance of the LSTM is illustrated in Fig. 3. The mean squared error (MSE) of the validation set training curve remains stable at 0.067, indicating good predictive performance of the LSTM. Table 2 summarizes the performance of various machine learning (ML) models in predicting the development of depressive symptoms among complete data participants during sensitivity analysis.

Fig. 3
figure 3

Predictive performance of LSTM for Depression. (LSTM training curve. The x-axis represents the number of epochs, and the y-axis represents the loss (mean square error, MSE). The blue line represents the training loss, and the orange line represents the validation loss.)

Table 2 Model performance in predicting courses of depressive symptoms for sensitivity analysis on only participants with complete data

Among the evaluated models, the deep learning model CNN achieves the highest accuracy (76.4%), surpassing that of general machine learning models, with the support vector machine (SVM) model achieving the highest accuracy (74.2%) and the random forest model the lowest. Sensitivity refers to the ability to correctly identify diseased individuals among all actual patients. The CNN achieves the highest sensitivity at 58.1%, followed by the SVM at 50.0%. The positive predictive value (PPV) indicates the proportion of actual patients among all diagnosed patients (higher is better). The PPV ranges from 68.3% (random forest) to 72.2% (CNN). The Brier score, which measures the mean squared difference between the predicted probabilities and actual outcomes, is lowest for the CNN at 0.161. The AUC (area under the curve) refers to the area under the ROC curve and is used to assess the performance of binary classifiers, with values closer to 1 indicating better classifier performance. The range spans from 0.775 (random forest) to 0.804 (CNN), with the CNN exhibiting the best performance (Fig. 4A). The results of the decision curve analysis are shown in Fig. 4B. Within the threshold range of 0–1, the net benefit of the CNN is significantly greater than that of other machine learning models.

Fig. 4
figure 4

Predictive performance of the five machine learning models for depression. A ROC curves for the five machine learning models. The x-axis represents specificity (probability of a negative test given that the elderly did not have depression), and the y-axis represents sensitivity (probability of a positive test given that the elderly had depression). B Decision Curve Analysis for Five Machine Learning Models. The x-axis represents the threshold probability of depression, and the y-axis represents the net benefit.)

Overall, the CNN outperforms general machine learning models across all the metrics. Its significant advantage lies in handling complex nonlinear data patterns, attributed to its convolutional and pooling layer structure. By extracting features at different levels through multiple convolutional and pooling layers, CNNs achieve richer and more advanced data representations. This end-to-end learning approach enables CNNs to adapt better to data features and extract the most discriminative and predictive features, thereby enhancing model performance and generalization ability [48].

Although CNN models exhibit good overall performance, their impact in the field of data analysis remains limited because of the black-box nature of deep learning methods [48, 49]. In recent years, researchers have started using Shapley values to assess the relative contributions of each predictor in predictive models and elucidate their impact on outcomes [50]. The greatest advantage of Shapley values lies in their independence from the predictive model, making them applicable to any machine learning model [51]. Therefore, this study employs Shapley values to interpret the model results.

Importance of predictive variables for middle-aged and older people populations

This study employed the SHAP method to conduct an interpretive analysis of the CNN model with the best performance. Figure 5 presents the importance of various predictive factors within the CNN model in descending order. Figure 5A depicts the bee swarm plot, where SHAP values illustrate the impact of each feature on the model output (i.e., depression prediction) and demonstrate how these impacts vary with feature values. The y-axis represents the evaluated features, while the colors indicate the magnitude of the feature values: the farther the points are from the x-axis, the greater the impact of the feature on depression prediction. Figure 5B represents the SHAP summary bar plot, where average SHAP values display the average impact of each feature, providing a more intuitive view of each feature's contribution to the overall prediction. The results indicate that disability, life satisfaction, activities of daily living (ADL), self-rated health status, and self-reported memory are the five key predictive factors influencing depression in middle-aged and Older people individuals.

Fig. 5
figure 5

Variable importance of the LSTM + CNN model for predicting depression in middle-aged and elderly populations. A SHAP values for middle-aged and elderly individuals. B Mean SHAP values for middle-aged and elderly individuals.)

Comparison of feature importance between RF and CNN

To evaluate the strength of the association between the dependent variable and key predictors, as well as the heterogeneity in feature selection across models, we first computed feature importance in the Random Forest (RF) model using impurity-based importance. This method quantifies each feature’s contribution by measuring its reduction in impurity, as represented by the Gini coefficient. The global feature importance ranking of the RF model is presented in Fig. 6C.

Fig. 6
figure 6

Differences in Predictor Importance Between CNN and RF Models (6 A: Feature importance in the CNN model computed using SHAP; 6B: Feature importance in the RF model computed using SHAP, where Class 1 represents depression and Class 0 represents non-depression; 6 C: Feature importance in the RF model computed using impurity-based methods.)

Furthermore, we compared the feature importance derived from SHAP values with those obtained from the aforementioned methods (Fig. 6B vs. Figure 6C). The results indicate a high degree of consistency across different evaluation techniques, particularly for Disability, Life Satisfaction, Chronic Disease, and Activities of Daily Living (ADL), which consistently demonstrate high importance. This suggests that these variables exert a stable influence within the RF model.

A comparative analysis of CNN (Fig. 6A) and RF (Fig. 6B) based on SHAP importance reveals notable differences in feature importance rankings. While both models prioritize functional health indicators—such as ADL, self-rated health, and chronic diseases—while assigning relatively lower predictive significance to demographic characteristics (e.g., gender, age) and health behaviors (e.g., smoking, alcohol consumption), a key distinction emerges: CNN exhibits significantly higher heterogeneity in feature importance distribution, whereas RF demonstrates a more uniform weight distribution pattern.

Differences in the importance of predictive variables for middle-aged and Older people populations

On the basis of the age cut-off of 60 years in 2018, participants were divided into middle-aged and Older people samples for grouping predictions and comparisons. Figures 7A and 7C display the importance of predictive variables for middle-aged individuals, whereas Figs. 7B and 7D present the importance for Older people individuals. The results indicate discrepancies in the top three crucial predictive variables between the two age groups. Disability, life satisfaction, and chronic illness were identified as the top three predictors for the middle-aged group, whereas life satisfaction, chronic illness, and activities of daily living (ADL) impairment were prioritized for the Older people group. However, the top five crucial predictive variables were consistent across both age groups and aligned with the previously identified top five predictors from the combined middle-aged and Older people data.

Fig. 7
figure 7

Differences in feature importance for the middle-aged and elderly groups in the LSTM + CNN model (A, SHAP values for the middle-aged group. C, Mean SHAP values for the middle-aged group. B, SHAP values for the elderly group. D, Mean SHAP values for the elderly group.)

Discussion

Divergence in model performance

This study utilized panel data from the first to fifth waves (2011, 2013, 2015, 2018, and 2020) of the China Health and Retirement Longitudinal Study (CHARLS), covering 3,706 middle-aged and Older people individuals aged 45 years and above. The dataset encompasses various variables, including demographics, socioeconomic status, lifestyle, and health conditions. By employing an LSTM model, we forecasted the risk factors for the fifth wave on the basis of the preceding four waves and employed five machine learning models, namely, logistic regression, XGBoost, random forest, support vector machine (SVM), and a convolutional neural network (CNN), for depression classification. The results indicated that the combined analysis framework of LSTM and other machine learning models exhibited promising performance in predicting depression among Older people individuals. Specifically, LSTM effectively captures long-term dependencies in time series through its gating mechanism, extracting dynamic patterns of health, psychological, and social features. For example, it learns how chronic disease progression affects LS over time, how reduced SAPH further accelerates this decline, and how prolonged LS deterioration eventually elevates depression risk. These findings suggest that LSTM not only captures temporal dependencies but also models how different risk factors interact dynamically over extended periods. On the basis of the LSTM-predicted features, the CNN demonstrated the best performance in depression prediction. In our designed predictive model, we observed an AUC ranging from 0.775–0.804 and an accuracy ranging from 0.732–0.764. Compared with previous studies [32, 34], our model exhibited higher accuracy.

Feature importance in middle-aged and older adults

We further utilized SHAP analysis to assess the important features in depression prediction. The results revealed that disability, life satisfaction, activities of daily living (ADL) impairment, self-rated health status, and self-reported memory were the top five crucial predictive factors for depression in middle-aged and Older people individuals. Disability emerged as the most significant predictor in the model, limiting the daily functioning of middle-aged and Older people individuals. This not only increases their life challenges but is also associated with reduced social activities, feelings of loneliness, and impacts on self-identity, thus predisposing them to emotional distress and depression. Consistent with previous research [52], our study provides support for the relationship between disability and depressive symptoms. Life satisfaction ranked as the second most important predictive factor. During middle and old age, individuals face retirement, signifying the beginning of the loss of status, power, and prestige [19]. Changes in social and life roles may lead to psychological burdens, potentially affecting the enjoyment of life, positive self-perception, and optimism, thereby reducing life quality and satisfaction [19]. Additionally, ADL impairment has been consistently associated with depressive symptoms in older adults. Feeling incompetent or dependent on others in daily activities such as cooking and bathing may lead to lowered self-esteem and psychological burdens. According to Yaka, reporting poor health status often leads to worries about the consequences of illness and emotional distress [53]. Therefore, it is reasonable to infer that poor health status may not only be associated with physical functional impairments but also serve as a stressor, exacerbating emotional distress and triggering or worsening depressive symptoms [19, 54]. Consistent with prior research [47], self-rated health status impairment has also been identified as an important predictor of depression [19]. Finally, our study suggests that while basic demographic characteristics (marital status), health behaviors (smoking, drinking), and social support (medical insurance and healthcare support) are factors influencing depressive symptoms, subjective quality of life and health status assessment are the most critical factors affecting depressive symptoms.

Discrepancies in feature importance between CNN and RF models

The differences in feature importance between CNN and RF models may stem from their distinct architectures. CNNs learn local patterns through convolutional layers and capture complex high-order feature interactions. As a result, their SHAP importance is often concentrated on a few key variables. For instance, Disability and Life Satisfaction may contain higher-level feature patterns, leading CNNs to assign greater importance to these features.

In contrast, RF employs a decision-tree-based structure, where SHAP calculations depend on the contribution of features to the decision tree splitting process. Consequently, RF places more emphasis on the independent impact of individual features on model decisions. For example, Social Activities and Drinking may exert a greater influence on specific category predictions, resulting in higher feature importance in RF.

Differential feature importance between middle-aged and older cohorts

Our study also explored differences in depression prediction between middle-aged and Older people populations. While the top five important predictive factors were similar between the two age groups, their specific importance varied slightly. For most middle-aged individuals, work and family responsibilities are the primary focus, with disabilities directly impacting their work and daily functioning, leading to increased life burdens and psychological stress [55]. Conversely, in the Older people population, the three most important predictive factors were life satisfaction, chronic diseases, and activities of daily living (ADL). Older people individuals tend to prioritize overall life satisfaction, not just basic health conditions [56]. Compared with middle-aged individuals, Older people individuals are more prone to chronic diseases such as hypertension, diabetes, and cardiovascular diseases [57]. These conditions not only affect their physical health but also may lead to increased life stress, decreased quality of life, and heightened risk of depression [58].

This study reveals the potential of involving families, communities, and nonprofessionals in the prevention and treatment of depression. By understanding the risk factors for depression, we can assist middle-aged and Older people individuals in better coping with emotional distress. Training healthcare professionals in early intervention and screening, as well as developing user-friendly depression screening tools, will contribute to improved treatment outcomes. Therefore, this research provides practical support and guidance for the prevention and treatment of depression among middle-aged and Older people people.

Limitations and future directions

This study utilized data from the China Health and Retirement Longitudinal Study. While we selected some representative features from different perspectives, the features were limited. Future research could integrate biological, genetic, neuroimaging, life events, and other available features for further investigation. Additionally, owing to the limited timeframe of the study, the LSTM model may not capture the long-term factors and dynamic changes influencing depression. Future research could address this limitation by incorporating more waves of data. To construct a balanced panel dataset, we excluded samples with missing values. Little’s MCAR test confirmed a completely random missing pattern, suggesting that future research could apply imputation methods to obtain a more complete and representative dataset. Finally, key depression risk factors—such as chronic disease count, health satisfaction, and life satisfaction—may have complex interrelations [59]. Future research could leverage longitudinal data to explore their underlying mechanisms and causal pathways, deepening the understanding of depression risk. ADL impairment patterns may differ between chronically ill and healthy older adults [60]. Future research could refine analysis using sub-models: (1) no ADL difficulties and (2) ADL impairment or need for assistance.

Conclusions

The LSTM + ML model successfully captured high-dimensional and time series information on depression risk factors in middle-aged and Older people populations. There are notable differences in depression risk factors between middle-aged and Older people individuals. The predictive model developed in this study holds significant value for the early detection and intervention of depression for healthcare professionals, including doctors, nurses, and community healthcare providers.

Data availability

The data used in this study were obtained from the China Health and Retirement Longitudinal Study (CHARLS). The dataset is publicly available and can be accessed through the CHARLS official website: https://charls.charlsdata.com. Additionally, the data can be requested from the corresponding author upon reasonable request.

References

  1. Jaeschke K, Hanna F, Ali S, Chowdhary N, Dua T, Charlson F. Global estimates of service coverage for severe mental disorders: findings from the WHO Mental Health Atlas 2017. Glob Ment Health. 2021;8: e27. https://doiorg.publicaciones.saludcastillayleon.es/10.1017/gmh.2021.19.

    Article  Google Scholar 

  2. Steel Z, Marnane C, Iranpour C, Chey T, Jackson JW, Patel V, et al. The global prevalence of common mental disorders: a systematic review and meta-analysis 1980–2013. Int J Epidemiol. 2014;43(2):476–93. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/ije/dyu038.

    Article  PubMed  PubMed Central  Google Scholar 

  3. Farahzadi M. Depression; let’s talk. J Community Health Res. 2017;6:74–6 https://www.researchgate.net/profile/Mohammad-Farahzadi/publication/318307850_Depression_Let's_talk/links/599f320baca2724fca7a1da1/Depression-Lets-talk.pdf.

    Google Scholar 

  4. Vos T, Allen C, Arora M, Barber RM, Bhutta ZA, Brown A, et al. Global, regional, and national incidence, prevalence, and years lived with disability for 310 diseases and injuries, 1990–2015: a systematic analysis for the global burden of disease study 2015. The lancet. 2016;388(10053):1545–602. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/S0140-6736(16)31678-6.

    Article  Google Scholar 

  5. Ferrari AJ, Charlson FJ, Norman RE, Patten SB, Freedman G, Murray CJ, et al. Burden of depressive disorders by country, sex, age, and year: findings from the global burden of disease study 2010. PLoS Med. 2013;10(11): e1001547. https://doiorg.publicaciones.saludcastillayleon.es/10.1371/journal.pmed.1001547.

    Article  PubMed  PubMed Central  Google Scholar 

  6. Liu Q, Cai H, Yang LH, Xiang Y-B, Yang G, Li H, et al. Depressive symptoms and their association with social determinants and chronic diseases in middle-aged and elderly Chinese people. Sci Rep. 2018;8(1):3841. https://doiorg.publicaciones.saludcastillayleon.es/10.1038/s41598-018-22175-2.

    Article  PubMed  PubMed Central  Google Scholar 

  7. Liao S, Zhou Y, Liu Y, Wang R. Variety, frequency, and type of Internet use and its association with risk of depression in middle-and older-aged Chinese: a cross-sectional study. J Affect Disord. 2020;273:280–90. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.jad.2020.04.022.

    Article  PubMed  Google Scholar 

  8. Wang Q, Ding F, Chen D, Zhang X, Shen K, Fan Y, et al. Intervention effect of psychodrama on depression and anxiety: a meta-analysis based on Chinese samples. Arts Psychother. 2020;69: 101661. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.aip.2020.101661.

    Article  Google Scholar 

  9. Zhang L, Xu Y, Nie H, Zhang Y, Wu Y. The prevalence of depressive symptoms among the older in China: a meta-analysis. Int J Geriatr Psychiatry. 2012;27(9):900–6. https://doiorg.publicaciones.saludcastillayleon.es/10.1002/gps.2821.

    Article  PubMed  Google Scholar 

  10. Alexopoulos GS. Depression in the elderly. The lancet. 2005;365(9475):1961–70. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/S0140-6736(05)66665-2.

    Article  Google Scholar 

  11. Ni Y, Tein J-Y, Zhang M, Yang Y, Wu G. Changes in depression among older adults in China: a latent transition analysis. J Affect Disord. 2017;209:3–9. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.jad.2016.11.004.

    Article  PubMed  Google Scholar 

  12. Ho CS, Lim LJ, Lim A, Chan NH, Tan R, Lee S, et al. Diagnostic and predictive applications of functional near-infrared spectroscopy for major depressive disorder: a systematic review. Front Psych. 2020;11:378. https://doiorg.publicaciones.saludcastillayleon.es/10.3389/fpsyt.2020.00378.

    Article  Google Scholar 

  13. Widge AS, Bilge MT, Montana R, Chang W, Rodriguez CI, Deckersbach T, et al. Electroencephalographic biomarkers for treatment response prediction in major depressive illness: a meta-analysis. Am J Psychiatry. 2019;176(1):44–56. https://doiorg.publicaciones.saludcastillayleon.es/10.1176/appi.ajp.2018.17121358.

    Article  PubMed  Google Scholar 

  14. Gan P, Xie Y, Duan W, Deng Q, Yu X. Rumination and loneliness independently predict six-month later depression symptoms among Chinese elderly in nursing homes. PLoS One. 2015;10(9): e0137176. https://doiorg.publicaciones.saludcastillayleon.es/10.1371/journal.pone.0137176.

    Article  PubMed  PubMed Central  Google Scholar 

  15. Zenebe Y, Akele B, W/Selassie M, Necho M. Prevalence and determinants of depression among old age: a systematic review and meta-analysis. Annals of general psychiatry. 2021;20(1):55. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12991-021-00375-x.

    Article  PubMed  PubMed Central  Google Scholar 

  16. Lin H, Jin M, Liu Q, Du Y, Fu J, Sun C, et al. Gender-specific prevalence and influencing factors of depression in elderly in rural China: a cross-sectional study. J Affect Disord. 2021;288:99–106. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.jad.2021.03.078.

    Article  PubMed  Google Scholar 

  17. Chen X, Mo Q, Yu B, Bai X, Jia C, Zhou L, et al. Hierarchical and nested associations of suicide with marriage, social support, quality of life, and depression among the elderly in rural China: machine learning of psychological autopsy data. Front Psych. 2022;13: 1000026. https://doiorg.publicaciones.saludcastillayleon.es/10.3389/fpsyt.2022.1000026.

    Article  Google Scholar 

  18. Beard JR, Officer A, De Carvalho IA, Sadana R, Pot AM, Michel J-P, et al. The world report on ageing and health: a policy framework for healthy ageing. The lancet. 2016;387(10033):2145–54. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/S0140-6736(15)00516-4.

    Article  Google Scholar 

  19. Fan X, Guo X, Ren Z, Li X, He M, Shi H, et al. The prevalence of depressive symptoms and associated factors in middle-aged and elderly Chinese people. J Affect Disord. 2021;293:222–8. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.jad.2021.06.044.

    Article  PubMed  Google Scholar 

  20. Yan Y, Du Y, Li X, Ping W, Chang Y. Physical function, ADL, and depressive symptoms in Chinese elderly: evidence from the CHARLS. Front Public Health. 2023;11: 1017689. https://doiorg.publicaciones.saludcastillayleon.es/10.3389/fpubh.2023.1017689.

    Article  PubMed  PubMed Central  Google Scholar 

  21. Feng T, Feng Z, Liu Q, Jiang L, Yu Q, Liu K. Drinking habits and water sources with the incidence of cognitive impairment in Chinese elderly population: the Chinese longitudinal healthy longevity survey. J Affect Disord. 2021;281:406–12. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.jad.2020.12.044.

    Article  PubMed  Google Scholar 

  22. Monroe DC, McDowell CP, Kenny RA, Herring MP. Dynamic associations between anxiety, depression, and tobacco use in older adults: results from the Irish longitudinal study on ageing. J Psychiatr Res. 2021;139:99–105. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.jpsychires.2021.05.017.

    Article  PubMed  PubMed Central  Google Scholar 

  23. Kumagai N, Tajika A, Hasegawa A, Kawanishi N, Horikoshi M, Shimodera S, et al. Predicting recurrence of depression using lifelog data: an explanatory feasibility study with a panel VAR approach. BMC Psychiatry. 2019;19:1–12. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12888-019-2382-2.

    Article  Google Scholar 

  24. Van As BAL, Imbimbo E, Franceschi A, Menesini E, Nocentini A. The longitudinal association between loneliness and depressive symptoms in the elderly: a systematic review. Int Psychogeriatr. 2022;34(7):657–69. https://doiorg.publicaciones.saludcastillayleon.es/10.1017/S1041610221000399.

    Article  PubMed  Google Scholar 

  25. Harshfield EL, Pennells L, Schwartz JE, Willeit P, Kaptoge S, Bell S, et al. Association between depressive symptoms and incident cardiovascular diseases. JAMA. 2020;324(23):2396–405. https://doiorg.publicaciones.saludcastillayleon.es/10.1001/jama.2020.23068.

    Article  PubMed  PubMed Central  Google Scholar 

  26. Kondo K, Antick JR, Ayers CK, Kansagara D, Chopra P. Depression screening tools for patients with kidney failure: a systematic review. Clin J Am Soc Nephrol. 2020;15(12):1785–95. https://doiorg.publicaciones.saludcastillayleon.es/10.2215/CJN.05540420.

    Article  PubMed  PubMed Central  Google Scholar 

  27. Azadbakht M, Tanjani PT, Fadayevatan R, Froughan M, Zanjari N. The prevalence and predictors of diabetes distress in elderly with type 2 diabetes mellitus. Diabetes Res Clin Pract. 2020;163: 108133. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.diabres.2020.108133.

    Article  PubMed  Google Scholar 

  28. Matcham F, Rayner L, Steer S, Hotopf M. The prevalence of depression in rheumatoid arthritis: a systematic review and meta-analysis. Rheumatology. 2013;52(12):2136–48. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/rheumatology/ket169.

    Article  PubMed  PubMed Central  Google Scholar 

  29. Wiels W, Baeken C, Engelborghs S. Depressive symptoms in the elderly-an early symptom of dementia? A systematic review. Front Pharmacol. 2020;11(34):2020.

    Google Scholar 

  30. Lee Y, Ragguett R-M, Mansur RB, Boutilier JJ, Rosenblat JD, Trevizol A, et al. Applications of machine learning algorithms to predict therapeutic outcomes in depression: a meta-analysis and systematic review. J Affect Disord. 2018;241:519–32. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.jad.2018.08.073.

    Article  PubMed  Google Scholar 

  31. Xu Z, Zhang Q, Li W, Li M, Yip PSF. Individualized prediction of depressive disorder in the elderly: a multitask deep learning approach. Int J Med Informatics. 2019;132: 103973. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.ijmedinf.2019.103973.

    Article  Google Scholar 

  32. Lin S, Wu Y, Fang Y. A hybrid machine learning model of depression estimation in home-based older adults: a 7-year follow-up study. BMC Psychiatry. 2022;22(1):816.

    Article  PubMed  PubMed Central  Google Scholar 

  33. Uyulan C, Ergüzel TT, Unubol H, Cebi M, Sayar GH, Nezhad Asad M, et al. Major depressive disorder classification based on different convolutional neural network models: deep learning approach. Clin EEG Neurosci. 2021;52(1):38–51. https://doiorg.publicaciones.saludcastillayleon.es/10.1177/1550059420916634.

    Article  PubMed  Google Scholar 

  34. Su D, Zhang X, He K, Chen Y. Use of machine learning approach to predict depression in the elderly in China: a longitudinal study. J Affect Disord. 2021;282:289–98. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.jad.2020.12.160.

    Article  PubMed  Google Scholar 

  35. Doshi JA, Cen L, Polsky D. Depression and retirement in late middle-aged US workers. Health Serv Res. 2008;43(2):693–713. https://doiorg.publicaciones.saludcastillayleon.es/10.1111/j.1475-6773.2007.00782.x.

    Article  PubMed  PubMed Central  Google Scholar 

  36. Dennerstein L, Dudley E, Guthrie J. Empty nest or revolving door? A prospective study of women’s quality of life in midlife during the phase of children leaving and re-entering the home. Psychol Med. 2002;32(3):545–50. https://doiorg.publicaciones.saludcastillayleon.es/10.1017/S0033291701004810.

    Article  PubMed  Google Scholar 

  37. Djernes JK. Prevalence and predictors of depression in populations of elderly: a review. Acta Psychiatr Scand. 2006;113(5):372–87. https://doiorg.publicaciones.saludcastillayleon.es/10.1111/j.1600-0447.2006.00770.x.

    Article  PubMed  Google Scholar 

  38. Kim JS, Kang S. A study on body image, sexual quality of life, depression, and quality of life in middle-aged adults. Asian Nurs Res. 2015;9(2):96–103. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.anr.2014.12.001.

    Article  Google Scholar 

  39. Kivelä S-L, Köngäs-Saviaro P, Laippala P, Pahkala K, Kesti E. Social and psychosocial factors predicting depression in old age: a longitudinal study. Int Psychogeriatr. 1996;8(4):635–44. https://doiorg.publicaciones.saludcastillayleon.es/10.1017/S1041610296002943.

    Article  PubMed  Google Scholar 

  40. Yan Y. Familial affections vis-à-vis filial piety: the ethical challenges facing eldercare under neo-familism in contemporary China. The Journal of Chinese Sociology. 2023;10(1):5. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s40711-023-00185-6.

    Article  Google Scholar 

  41. Zhao Y, Hu Y, Smith JP, Strauss J, Yang G. Cohort profile: the China health and retirement longitudinal study (CHARLS). Int J Epidemiol. 2014;43(1):61–8. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/ije/dys203.

    Article  PubMed  Google Scholar 

  42. Boey KW. Cross-validation of a short form of the CES-D in Chinese elderly. Int J Geriatr Psychiatry. 1999;14(8):608–17. https://doiorg.publicaciones.saludcastillayleon.es/10.1002/(sici)1099-1166(199908)14:8<608::aid-gps991>3.0.co;2-z.

    Article  PubMed  Google Scholar 

  43. Fang M, Mirutse G, Guo L, Ma X. Role of socioeconomic status and housing conditions in geriatric depression in rural China: a cross-sectional study. BMJ Open. 2019;9(5): e024046. https://doiorg.publicaciones.saludcastillayleon.es/10.1136/bmjopen-2018-024046.

    Article  PubMed  PubMed Central  Google Scholar 

  44. Fu H, Si L, Guo R. What is the optimal cut-off point of the 10-item center for epidemiologic studies depression scale for screening depression among Chinese individuals aged 45 and over? An exploration using latent profile analysis. Front Psych. 2022;13: 820777. https://doiorg.publicaciones.saludcastillayleon.es/10.3389/fpsyt.2022.820777.

    Article  Google Scholar 

  45. Lin S, Wu Y, Fang Y. A hybrid machine learning model of depression estimation in home-based older adults: a 7-year follow-up study. BMC Psychiatry. 2022;22(1):1–13. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12888-022-04439-4.

    Article  Google Scholar 

  46. Whangbo T-K, Eun S-J, Jung E-Y, Park DK, Kim SJ, Kim CH, et al. Personalized urination activity recognition based on a recurrent neural network using smart band. Int Neurourol J. 2018;22(Suppl 2):S91. https://doiorg.publicaciones.saludcastillayleon.es/10.5213/inj.1836168.084.

    Article  PubMed  PubMed Central  Google Scholar 

  47. Handing EP, Strobl C, Jiao Y, Feliciano L, Aichele S. Predictors of depression among middle-aged and older men and women in Europe: a machine learning approach. Lancet Reg Health Eur. 2022;18. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.lanepe.2022.100391.

  48. Montesinos López OA, Montesinos López A, Crossa J. Convolutional neural networks. Multivariate statistical machine learning methods for genomic prediction: Springer; 2022. p. 533–77.

    Google Scholar 

  49. Adadi A, Berrada M. Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE access. 2018;6:52138–60. https://doiorg.publicaciones.saludcastillayleon.es/10.1109/ACCESS.2018.2870052.

    Article  Google Scholar 

  50. Ballester PL, Cardoso TdA, Moreira FP, da Silva RA, Mondin TC, Araujo RM, et al. 5-year incidence of suicide-risk in youth: a gradient tree boosting and SHAP study. Journal of affective disorders. 2021;295:1049–56. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.jad.2021.08.033.

    Article  PubMed  Google Scholar 

  51. Moncada-Torres A, van Maaren MC, Hendriks MP, Siesling S, Geleijnse G. Explainable machine learning can outperform Cox regression predictions and provide insights in breast cancer survival. Sci Rep. 2021;11(1):6968. https://doiorg.publicaciones.saludcastillayleon.es/10.1038/s41598-021-86327-7.

    Article  PubMed  PubMed Central  Google Scholar 

  52. Xin Y, Ren X. Predicting depression among rural and urban disabled elderly in China using a random forest classifier. BMC Psychiatry. 2022;22(1):118. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12888-022-03742-4.

    Article  PubMed  PubMed Central  Google Scholar 

  53. Yaka E, Keskinoglu P, Ucku R, Yener GG, Tunca Z. Prevalence and risk factors of depression among community dwelling elderly. Arch Gerontol Geriatr. 2014;59(1):150–4. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.archger.2014.03.014.

    Article  PubMed  Google Scholar 

  54. Ma L, Tang Z, Sun F, Diao L, Li Y, Wang J, et al. Risk factors for depression among elderly subjects with hypertension living at home in China. International journal of clinical and experimental medicine. 2015;8(2):2923 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4402903/.

    PubMed  PubMed Central  Google Scholar 

  55. Cree RA. Frequent mental distress among adults, by disability status, disability type, and selected characteristics—United States, 2018. MMWR Morb Mortal Wkly Rep. 2020;69. https://www.cdc.gov/mmwr/volumes/69/wr/mm6936a2.htm.

  56. Lee S-W, Choi J-S, Lee M. Life satisfaction and depression in the oldest old: a longitudinal study. The International Journal of Aging and Human Development. 2020;91(1):37–59. https://doiorg.publicaciones.saludcastillayleon.es/10.1177/0091415019843448.

    Article  PubMed  Google Scholar 

  57. Jiang CH, Zhu F, Qin TT. Relationships between chronic diseases and depression among middle-aged and elderly people in China: a prospective study from CHARLS. Curr Med Sci. 2020;40(5):858–70. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/s11596-020-2270-5.

    Article  PubMed  Google Scholar 

  58. Bi Y-H, Pei J-J, Hao C, Yao W, Wang H-X. The relationship between chronic diseases and depression in middle-aged and older adults: a 4-year follow-up study from the China health and retirement longitudinal study. J Affect Disord. 2021;289:160–6. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.jad.2021.04.032.

    Article  PubMed  Google Scholar 

  59. Yoon K, Lee M. Factors influencing the health satisfaction of users of public health and medical institutions in South Korea. Front Public Health. 2023;10: 1079347. https://doiorg.publicaciones.saludcastillayleon.es/10.3389/fpubh.2022.1079347.

    Article  PubMed  PubMed Central  Google Scholar 

  60. Kumagai N, Jakovljević M. Random forest model used to predict the medical out-of-pocket costs of hypertensive patients. Front Public Health. 2024;12: 1382354. https://doiorg.publicaciones.saludcastillayleon.es/10.3389/fpubh.2024.1382354.

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We would like to acknowledge the China Health and Retirement Longitudinal Study (CHARLS) team for providing data. We are grateful to all the subjects who participated in the survey.

Funding

This research was supported by the Research Achievements of the Jiangxi Provincial Higher Education Institutions Humanities and Social Science Research Special Project (Educational Impact of Red Culture) (HSWH24034), the Special Project of the Educational Reform Research Program at Jiangxi University of Finance and Economics (JG2024019), and the General Project of Guangdong Provincial Philosophy and Social Science Planning 2024 (GD24CYJ22).

Author information

Authors and Affiliations

Authors

Contributions

L.Z. and R.G.W. designed the study. L.Z. and R.G.W. collected and processed the data. R.G.W. performed the statistical analyses. R.G.W. and J.W.Z. interpreted the results and wrote the initial draft of the manuscript. X.L.C., L.T, X.Y.N, Z.L.Z., and M.Q.Z. Provided critical revisions and contributed to the final manuscript. L.Z. provided funding for the research. All the authors read the manuscript and approved its submission to BMC Psychology.

Corresponding author

Correspondence to Ruigang Wei.

Ethics declarations

Ethics approval and consent to participate

The study was conducted in accordance with the guidelines of the Declaration of Helsinki. Ethical approval was obtained from the Biomedical Ethics Committee of Peking University (IRB 00001052–11015). Informed consent to participate was obtained from all participants prior to data collection.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, L., Wei, R., Zhou, J. et al. Predicting depression and unravelling its heterogeneous influences in middle-aged and older people populations: a machine learning approach. BMC Psychol 13, 395 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s40359-025-02691-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s40359-025-02691-3

Keywords