Using tree-based models to identify factors contributing to trait negative affect in adults

Cañizares, Catalina; Gómez-Maquet, Yvonne; Ferro, Eugenio; Torres, Carlos Arturo; Agudelo, Diana María; Odom, Gabriel

doi:10.1186/s40359-024-02245-z

Research
Open access
Published: 01 February 2025

Using tree-based models to identify factors contributing to trait negative affect in adults

Catalina Cañizares¹,
Yvonne Gómez-Maquet²,
Eugenio Ferro³,
Carlos Arturo Torres²,
Diana María Agudelo² &
…
Gabriel Odom¹

BMC Psychology volume 13, Article number: 92 (2025) Cite this article

831 Accesses
Metrics details

Abstract

Background

Individuals with high levels of negative affect (NA) are at an increased risk of experiencing distress and negative self-views. Theoretical models suggest that NA plays a critical role in psychopathology, particularly in Major Depressive Disorder (MDD), and is linked to cognitive-perceptual and affective regulation issues.

Objective

Determine whether maladaptive cognitive schemas, attributional style, childhood adversity, and lifestyle factors (including alcohol and drug use and physical activity) could effectively predict negative affect (NA) in adults.

Methods

A secondary data analysis was performed on a sample of 342 depressed and non-depressed adults. Beta regression and regression tree analyses were conducted to identify the principal risk factors and their interactions. The regression tree model was trained with 5-fold cross-validation on 75% of the sample, with 25% of observations held for testing.

Results

The findings revealed that the cognitive schemas of disconnection and rejection and impaired autonomy had a significant impact on the likelihood of higher scores on the State Depression Inventory (IDER) test (p < 0.001), as indicated by both beta regression and regression tree analyses. Additionally, childhood adversity emerged as a crucial factor in determining high levels of NA. The regression tree model achieved strong performance metrics, including an R-squared value of 0.77.

Conclusions

This study represents a significant step forward in the understanding of NA, as it considers a broad range of individual factors, such as cognitive schemas, lifestyle, and demographics, to predict its impact on NA, with potential implications for prevention programs aimed at reducing NA.

Peer Review reports

Background

Major Depressive Disorder (MDD) is a highly prevalent mental disorder worldwide, affecting nearly 280 million people globally and causing significant disability [1]. In Colombia, the most recent data from the National Survey of Mental Health [2] reports a prevalence rate of 4.3% for MDD, 1% for minor depression, and 0.5% for dysthymia, underscoring its prominence in the country. According to classic theoretical frameworks [3] and research [4], Negative Affect (NA) is a central component of MDD. NA refers to the tendency to experience unpleasant emotions, such as anger, guilt, worry, sadness, and disgust [3]. Individuals with high levels of NA are more likely to experience distress and hold a negative self-view, in contrast to those with Positive Affect (PA), who generally exhibit contentment and a positive self-perception [5].

NA has also been linked to specific personality dimensions [6]. Neuroticism is associated with high NA, while extraversion is associated with high PA [6]. Considering NA and PA as part of the dimensional understanding of personality allows for conceptualizing them in terms of both state and trait [6,7,8]. Trait refers to a relatively stable predisposition to respond in a certain way to situations, while state refers to emotional and mental conditions that are transitory [6]. Therefore, the interaction between state and trait characteristics enables a dimensional understanding of psychopathology, rather than simply considering the frequency of occurrence of symptoms [7, 9]. Therefore, the relationship between NA and psychopathology has been extensively studied, with numerous studies demonstrating a positive relationship between NA and various mental disorders [4, 10].

Despite the established link between NA and psychopathology, research exploring NA’s relationship with other psychological processes, particularly cognition, remains scarce. Specifically, the connection between negative affect and maladaptive schemas has not been extensively examined, and the literature lacks robust experimental or correlational studies to support this relationship. Early theorists posited that maladaptive schemas can lead to NA, manifesting as feelings of sadness, anxiety, and anger, which in turn reinforce these negative beliefs, creating a cycle of negative thinking [11]. Our review of the literature indicates that NA is correlated with various cognitive-perceptual and affective regulation issues [4], including unconventional beliefs, distorted assessments of stimuli, and impaired decision-making [12]. Additionally, high levels of negative affect and early maladaptive schemas have been observed in patients with major depression compared to healthy controls [13]. A pessimistic attributional style has also been found to correlate with negative affect and depressed mood in college students, though this has been less studied in clinical populations [14].

Given the significant impact and relevance of NA in psychopathology, particularly in MDD, several instruments have been developed to measure its presence. Among these, the IDER questionnaire -gets its name from its Spanish title, “Inventario de Depresión Estado Rasgo”- stands out as a valuable tool for evaluating the severity and variability of MDD symptoms, as well as the presence of NA [7, 9]. What sets the IDER questionnaire apart is its ability to measure both state and trait aspects of MDD, with a high level of sensitivity for detecting the presence of NA [7, 9]. It has proven useful in both clinical and research settings, particularly in areas such as personality, emotion, neuropsychology, and cognition [7]. Importantly, the IDER questionnaire focuses on affective symptoms of depression, rather than somatic and physical aspects, making it a measure to evaluate the presence of NA rather than a diagnostic tool [9]. Additionally, the instrument has been tested for its psychometric properties in Colombian samples, demonstrating high levels of reliability (0.71 to 0.86) and confirming the bifactorial structure (dysthymia and euthymia) in exploratory and confirmatory factorial analyses [6].

Considering that theoretical models suggest that NA plays a critical role in psychopathology especially in MDD [3] and MDD correlates with early maladaptative schemas [15] it is essential to identify factors that increase an individual’s likelihood of having higher levels of trait NA. Therefore, in this study, we aim to investigate the extent to which various factors, such as cognitive schemas, attributional style, childhood adversity, and lifestyle factors, including alcohol, drug use, and physical activity, can accurately predict trait NA in a sample of both depressed and non-depressed adults in Colombia using tree-based models.

It is hypothesized that the presence of maladaptive cognitive schemas, negative attributional style, childhood adversity, and lifestyle factors (including alcohol and drug use, and physical activity) will significantly correlate with trait NA. Specifically, maladaptive cognitive schemas and a pessimistic attributional style are expected to be positively associated with higher levels of trait NA. Additionally, greater childhood adversity and higher levels of substance use are anticipated to be positively associated with trait NA, whereas higher levels of physical activity are expected to be negatively associated with trait NA.

Recently, there have been applications of decision tree methods in mental health research given it automatizes detection of main effects and interactions [16]. The advantage of using tree-based models in this study is that they can handle complex relationships between predictors and outcomes, such as intricate interaction effects [17]. Additionally, tree-based models provide an interpretable and intuitive visualization method that can be easily communicated to clinicians [17]. By identifying factors that increase an individual’s likelihood of having higher levels of trait NA, this study may lead to more targeted interventions aimed at reducing NA and improving mental health outcomes in both depressed and non-depressed adults.

Methods

This study is a secondary data analysis utilizing data collected from a previous study conducted by Lattig and Collegues [18] that investigated the relationship between depression, genetics, and psychosocial variables. By leveraging the existing data, we aim to gain new insights into the factors contributing to NA scores, providing a complementary perspective to the original study.

Participants

The study recruited a total of 342 individuals, out of which 171 were diagnosed with Major Depressive Disorder (MDD) and recruited from two psychiatric hospitals in Bogota. The diagnosis was confirmed using the DSM IV-TR [19] criteria. To be included in the depressive group, participants had to be inpatients with MDD, over 18 years of age, and have completed at least basic primary education. Those with comorbid substance abuse or dependence, psychotic disorders, or dementia, as well as those with bipolar depression or delirium were excluded. The other 171 were individuals from the general population who had no history of MDD, were over 18 years of age, and had completed basic primary education or higher. Participants were excluded from the control group if they had past or present mental disorders or a family relationship with a case subject. These diagnoses were also confirmed by the Mini-International Neuropsychiatric Interview M.I.N.I [20].

Measures

The research team created a specialized questionnaire exclusively for this study, aimed at collecting detailed information on demographics, substance use, physical exercise, and exposure to childhood trauma. This questionnaire is original to our research and has not been published elsewhere. The English version can be found in the supplementary materials of this paper.

Demographics. The personal questionnaire instrument gathered participants’ sex (male, female) and age.

Substance use. The personal questionnaire also collected data on an individual’s alcohol, psychoactive substance, and cigarette usage. Participants were asked to report any usage in the past 30 days and the frequency of their daily consumption.

Physical Exercise. The personal questionnaire gathered information on the participants’ exercise routines through questions on their weekly physical activity or sports participation.

Childhood Adversity. The personal questionnaire seeked information on the participant’s history of childhood abuse, specifically asking if they have experienced any form of physical and psychologival abuse and the age range in which it occurred.

Attributional Style. The Life Experiences Survey (LES) was the tool used to measure an individual’s attributional style in the face of stressful situations. The LES [21] is a 66-item self-report questionnaire designed to evaluate a broad range of stress-inducing life events that an individual has experienced in the past two years. The questionnaire assesses the frequency and perceived stress level of each event, with a rating scale ranging from 0 (not at all stressful) to 4 (highly stressful). Additionally, the LES evaluates the perceived positivity/negativity, expectation, and control of each event. This section uses dichotomous options, where participants classify each stressful event as positive or negative, expected or unexpected, and whether they felt in control or not in control of the situation. The scoring of the test involves summing the responses for the stress scale, resulting in a total stress score and separate totals for each of the dichotomous questions. For the dichotomous questions, a value of 1 is assigned to the response of interest (negative, unexpected, not in control), and a value of 0 is assigned to the positive response (positive, expected, in control). Thus, this scale produces four different scores per participant. The reliability of the LES has been confirmed with a Cronbach’s alpha of 0.82, and it has been shown to have normative data and validity through various studies [21]. In the current sample, Cronbach’s alpha for the total scale was 0.84.

Cognitive Schemas. To measure early maladpatative schemas The Young Schema Questionnaire-Short Form (YSQ-SF) was used. The YSQ-SF is 75 items long questionnaire that consists of a subset of 5 items from each of the original 15 scales (the 5 items that load most strongly on the factors derived from factor analysis) [22]. Participants are asked to rate each of the 5 items on a six-point Likert scale, ranging from 1 (completely untrue of me) to 6 (describes me perfectly). Scores for each schema are found by adding the total number of responses within each schema. A study conducted in a Colombian population confirmed the presence of the 15 schemas and demonstrated good psychometric properties, with reported Cronbach’s alpha values ranging from 0.73 to 0.88 for the different schemas [23]. In the current sample, Cronbach’s alpha for the total scale was 0.94.

State Depression Inventory (IDER). The IDER questionnaire is a tool used to assess NA for depression both in terms of its frequency (trait) and intensity (state) at the time of evaluation [7]. It consists of 20 items divided into two scales: Trait and State, each with 10 items, with 5 items for dysthymia and 5 items for euthymia. The subject’s responses are scored based on their chosen answer option, ranging from 1 for “almost never” to 4 for “almost always.” The scores are assigned to items related to dysthymia on both scales. For the items related to euthymia, the score is reversed. The final score of a scale is obtained by summing the results of the two sub-scales, with a range from 20 to 80. The reliability levels of the test are reported to be high, ranging from 0.71 to 0.92 for the different scales in the general population, with Cronbach’s alphas ranging from 0.71 to 0.86 in the Colombian population [6]. In this study, the trait scale was chosen as the outcome measure, high scores range from 40 to 24, average levels range from 22 to 17 and low levels are 16 or less [24]. In the current sample, Cronbach’s alpha for the trait scale scale was 0.94.

Procedure

The original study was approved by the ethics committees of the participating institutions and an independent research ethics committee. Written informed consent was obtained from those who were interested in participating. The M.I.N.I. structured interview [20] was administered by a trained psychologist on the research team to confirm inclusion criteria and rule out exclusion criteria. Eligible participants then completed a battery of questionnaires, administered by one of the trained psychologists on the research team.

For this study the de-identified dataset was used. This dataset was previously curated to include only the psychosocial and cognitive variables.

Data analysis

To explore the factors associated with NA, we performed beta regression and regression tree analysis (CART) on the entire dataset without distinguishing between the depressed and non-depressed groups. By analyzing the whole sample as a single group, we were able to capture the full range of possible scores on the IDER scale and examine NA as a construct present in both groups. Considering that NA is a construct that is related to, yet different from, depression given that individuals may experience NA whether or not they are depressed [25].

Both Beta regression and CART were fitted to compare the results using different statistical techniques. Tree-based methods are a type of predictive modeling that seeks to identify similar groups of people based on their outcomes using a series of yes/no questions. These questions are plotted in a graphical format as a tree or bush, with the root representing the initial question and subsequent branches representing subsequent questions. To make predictions for a new person, the tree is traversed from the root to the end of the last branch by answering the yes/no questions. Questions that appear close to the root or frequently throughout the tree are more important for grouping people with similar outcomes.

Tree-based methods can produce very complex trees with small subgroups of people, which can be simplified by pruning or removing certain branches. The optimal amount of pruning is determined by testing different values of the cost complexity parameter (CP) on different parts of the data and choosing the value that results in the best performance on a separate set of data. By simplifying the tree, the model can be made more interpretable and easier to use in practice.

The beta regression model provides a standard approach for mental health audiences to interpret the regression results, while the regression tree output is a useful tool for clinical decision-making as it presents decision heuristics and pathways that clinicians can use when working with clients who have high levels of NA. The beta regression model is suitable for identifying the most significant predictors of NA, while the regression tree model can help clinicians tailor their treatment plans based on individual client needs. Therefore, the two models provide complementary insights that can be used to enhance the understanding and management of NA in mental health settings.

We chose to fit the data in a beta regression considering that the IDER score is conditionally distributed rather than Gaussian [26], meaning it is a continuous variable bounded between 10 and 40. To proceed the analysis we transformed the IDER scores to be between 0 and 1 using the following formula

$$\:{x}_{1}=\frac{{x}_{1}-\text{lower}}{\text{upper}-\text{lower}}$$

where $\:{x}_{1}$ is a specific score, lower is the minimum score in the range of scores, in this case 10 and upper is the highest score meaning 40.

Given that the data contains values that are at the upper and lower bound it was necessary to apply the lemon squeezer [26] transformation to squeeze the data that lies in [0,1] to be in (0,1). The tranformation applied was

$$\:x{\prime\:}=\frac{x\left(N-1\right)+s}{N}$$

where N is the sample size and s is a constant between 0 and 1. the s acts as if we were taking a prior into account so we decided to choose 0.5 conidering a Bayesian standpoint [26].

The beta regression was fitted using the betareg function from the betareg package in R 4.2.0. The regression tree was fitted using the tidymodels package in R 4.2.0. In preparation for modeling, the data was divided into training (75%) and testing sets (25%) using the initial_split function from the rsample packages. The division of the data was done to ensure that the model was trained on the training set and its performance could be accurately evaluated on the test set. The regression tree model was specified using the r_part engine in regression mode.

To avoid overfitting the model, 5-cross-validation was performed using the vfold_cv function from the rsample package. To optimize the model, a grid search was performed to find the best cost complexity parameter (CP). The tune_grid function from the tune package was used to specify a grid of possible cost complexity parameters to test, and the fit function was used to fit the model for each value of the cost CP. The performance metrics, Mean Absolute Error (MAE) and RMSE, were used to compare the results for each value of the CP and select the best model. Finally, the best CP was selected and the final regression tree model was built using the entire training data and was then evaluated on the validation set. All code and analysis scripts are available in the authors GitHub repository. The results of the model fit is displayed in a tree structure (Fig. 1) and summarized using performance metrics.

Results

Sample characteristics

Table 1 presents the demographic and lifestyle characteristics of the sample. The sample comprised 249 women and 93 men, with an age range spanning from 18 to 86 years, a mean age of 35.4 years (SD = 11.9), and a median age of 33.0 years. Among the non-depressed participants (N = 171), 78.9% were female and 21.1% were male, while among the depressed participants (N = 171), 66.7% were female and 33.3% were male. The mean age for non-depressed participants was 36.2 years (SD = 10.8), and for depressed participants, it was 34.6 years (SD = 12.8). In terms of age distribution, the majority of participants were in the early adulthood phase (25–34 years), comprising 37.7% of the sample, followed by young adults (18–24 years) at 19.6%. The mid-adult group (35–44 years) represented 18.7%, mature adults (45–54 years) 15.8%, older adults (55–64 years) 7.3%, and senior adults (65 + years) 0.9%. Childhood adversity was reported by 38.6% of participants, with a significant difference between non-depressed (22.8%) and depressed (54.4%) groups. Regarding lifestyle factors, 39.8% of participants reported engaging in physical exercise, with higher levels among non-depressed participants (45.6%) compared to depressed participants (33.9%). Smoking cigarettes was reported by 19.9% of participants, with a higher incidence in the depressed group (29.8%) compared to the non-depressed group (9.9%). Alcohol use was reported by 14.9% of participants, more common among depressed participants (22.2%) than non-depressed participants (7.6%). Finally, psychoactive substance use was reported by 3.2% of participants, with a higher proportion in the depressed group (5.8%) compared to the non-depressed group (0.6%).

Table 1 Demographics and life style variables for 342 depressed and non-depressed participants

Full size table

Table 2 presents the maladaptive cognitive schemas and cognitive attribution scores for the sample of 342 participants, comprising 171 depressed and 171 non-depressed individuals. The IDER score, which indicates the level of maladaptive cognitive schemas, was higher in depressed participants, with a mean score of 28.6 (SD = 6.76) compared to 14.7 (SD = 2.83) in non-depressed participants. The overall mean IDER score was 21.7 (SD = 8.66), with a median score of 19.0. In the domain of Disconnection and Rejection, depressed participants scored a mean of 18.1 (SD = 4.99), while non-depressed participants scored 9.32 (SD = 3.34), with an overall mean of 13.7 (SD = 6.11). For Impaired Autonomy, the mean score was 16.1 (SD = 4.65) for depressed participants and 8.10 (SD = 2.58) for non-depressed participants, resulting in an overall mean of 12.1 (SD = 5.49). In the Impaired Limits domain, depressed participants had a mean score of 18.8 (SD = 4.68), compared to 11.6 (SD = 4.16) in non-depressed participants, with an overall mean of 15.2 (SD = 5.72). The Other-Directedness domain showed a mean score of 19.8 (SD = 4.96) for depressed participants and 12.8 (SD = 3.77) for non-depressed participants, with an overall mean of 16.3 (SD = 5.63). In the Over-Vigilance/Inhibition domain, depressed participants scored a mean of 20.0 (SD = 4.14), while non-depressed participants scored 15.3 (SD = 4.10), with an overall mean of 17.6 (SD = 4.74). Additionally, 84.2% of depressed participants had high negative attribution scores, compared to 31.0% of non-depressed participants. Unexpected attribution was reported by 64.3% of depressed participants and 36.3% of non-depressed participants. Out of control attribution was reported by 38.6% of depressed participants and 4.1% of non-depressed participants. These findings indicate differences in cognitive schemas and attributions between depressed and non-depressed individuals.

Table 2 IDER, Maladpatative Cognitive Schemas and Attribution scores for 342 depressed and non-depressed participants

Full size table

Beta regression

The beta regression model results are presented in Table 3.To verify the model’s assumptions, we visually inspected the beta regression using R’s plot function. The generated graphics revealed that all assumptions were well within the acceptable range, the residuals appeared to be randomly distributed around zero, and the Cook’s distance plot did not indicate the presence of influential observations. Therefore, the beta regression model is valid for interpreting our data. The plots are available at the authors GitHub repository and in the appendix.

We also conducted a variance inflation factor (VIF) analysis to detect multicollinearity in the beta regression model. The results indicated that most predictor variables had VIF values below 3.9, indicating a moderate degree of multicollinearity. However, the Disconnection and Rejection variable had a VIF of 4.9, which could suggest a stronger correlation with other predictor variables in the model. Nevertheless, since none of the VIF values exceeded the recommended threshold of 5 or 10 [27], we concluded that the collinearity was not severe enough to affect our analysis’s overall findings.

Table 3 Beta regression for the IDER score in a sample of 255 depressed and non-depressed adults

Full size table

The predictors age, disconnection and rejection, impaired autonomy, negative attribution, out of control attribution, childhood adversity, and smoking cigarettes, are all statistically significant at the 0.05 level, meaning that it is unlikely to have arisen by chance. The negative coefficients for Negative Attribution, Out of Control Attribution, Childhood Adversity, and Smoking Cigarettes suggest that not having these predictors is associated with a decrease in the log odds of experiencing NA (These coefficients are in comparison to the “yes” reference group). In contrast, having these predictors is associated with an increase in the log odds of experiencing NA. Additionally, a negative coefficient for age suggests that as age increases, the odds of experiencing NA decreases.

The results also show that disconnection and rejection and impaired autonomy are significantly associated with the odds of experiencing NA. Each one unit increase in disconnection and rejection increases the log odds of experiencing NA by 0.06 units, while each one unit increase in impaired autonomy increases the log odds of experiencing NA by 0.08 units (in a scale from 0 to 1).

Table 3 displays the precision results for the mean model, which includes the pseudo R-squared of 0.60 that demonstrates a strong fit for the model as noted by Giselmar and collegues [28]. However, it is important to keep in mind that the pseudo R-squared should not be directly compared to the R-squared of linear regression models. This is because the former only measures how well the model fits the data, and does not estimate the accounted variability.

Regression tree results

Using the selected CP value, the regression tree model achieved an R-squared value of 0.77 on the testing data, indicating that 77% of the variance in the IDER score is explained by the predictors chosen. This substantial R-squared value signifies that the model effectively captures a significant portion of the variance. Moreover, with a Root Mean Square Error (RMSE) of 4.01, the model demonstrates a reasonably accurate prediction capability for the IDER score.

The final decision tree, as shown in Fig. 1, incorporates nine variables, including four cognitive domains: disconnection and rejection (dyrysq), impaired autonomy (padysq), impaired limits (liysq) and over-vigilance/inhibition (seiysq); two attributional styles: negative attribution (is_negative) and out of control attribution (is_no_control); two lifestyle variables: alcohol (is_alcohol) use and smoking cigarettes (is_smoke); and two demographic variables: age and childhood adversity (is_abuse).

The root node of the tree includes 255 observations, with a mean IDER score of 21.88. The disconnection and rejection variable (dyrysq) is the most critical factor in determining high or low IDER scores, with a cutoff value of 17. The tree also features three other main splits based on impaired autonomy (padysq), childhood adversity (is_abuse), and negative attribution (is_negative). These variables are crucial in determining high or low levels of NA.

The decision tree indicates that a score of at least 35 points in the IDER instrument (last blue box to the right) is predicted by the interaction of a 17 or higher score in the disconnection and rejection cognitive domain, a score higher than 21 in the impaired autonomy schema domain, and not consuming alcohol but smoking cigarettes. On the other hand, lower scores are determined by having low scores in the disconnection and rejection schema, lower scores in the impaired autonomy domain, not interpreting stressful situations as negative, and being 48 years of age or older.

In summary, the decision tree depicts several interactions between cognitive, lifestyle, and demographic variables associated with higher levels of NA. Upon analyzing the results of the beta regression and regression tree, it is apparent that there is a correspondence between the two methods, and they provide complementary information. The beta regression demonstrates that cognitive schemas such as disconnection and rejection, and impaired autonomy significantly increase the likelihood of higher scores in the IDER test (p < 0.001). These two schemas’ effects are evident in the tree as they are the first two splits, indicating their higher importance in estimating higher levels of NA. The regression tree reveals two additional schemas that are not significant in the beta regression, impaired autonomy and over-vigilance/inhibition. In combination with high levels of disconnection and rejection, impaired autonomy predicts higher scores in the IDER.

Furthermore, both methods agree that childhood adversity’s presence or absence is critical in determining high or low levels of NA. The beta regression suggests that not being subjected to childhood adversity decreases the chances of high scores in the IDER, and the tree regression complements this by showing that low scores are possible in the absence of childhood adversity, but this is only achievable with the presence of interpreting stressful events as under control, not consuming alcohol, and lower scores in the disconnection and rejection schema.

In the end, both methods identify the same predictors as crucial in determining the IDER score’s values, but they complement each other by revealing further interactions. Additionally, the tree delves deeper and shows how alcohol consumption may also be relevant in determining the score.

Discussion

This study aimed to examine the extent to which cognitive schemas, lifestyle factors, and demographic characteristics could accurately predict NA, as measured by the IDER questionnaire. It used Beta Regression and tree analysis as quantitative methods to evaluate and compare the factors associated with NA in a sample of adults with depression scores ranging from low to high values. Our results indicate that both methods agree on the significant impact of cognitive schemas, attributional styles, lifestyle, and demographic variables in predicting NA. Furthermore, both models demonstrated strong predictive performance, with the regression tree accounting for a considerable proportion of the variance in NA.

These findings suggest that four maladaptive cognitive schemas—namely disconnection and rejection, impaired autonomy, over-vigilance/inhibition and impaired limits—are associated with higher levels of NA. The beta regression analysis showed significant effects only for disconnection and rejection and impaired autonomy, with no significant effect found for impaired limits or over-vigilance/inhibition. Interestingly, the regression tree analysis did capture impaired limits and over-vigilance/inhibition as a maladaptive schema that increases the risk of NA. This discrepancy between the models can be explained by the nature of the regression tree, where the most important splits are at the top. Both disconnection and rejection, as well as impaired autonomy, are among the top splits, indicating their primary importance, which is in concordance with the beta regression. In contrast, impaired limits appears in the fourth split, and over-vigilance/inhibition in the third, suggesting it is not as critical on their own. However, the tree model highlights theit role within interactions, showing that impaired limits and over-vigilance/inhibition becomes important when combined with other factors. This interaction effect provides new insights into the complex dynamics of cognitive schemas in predicting negative affect, that cannot be captured by the linear beta regression model. With generalized regression models, such as the beta regression, there is only a limited amount of information that can be captured by the researcher. This is demonstrated in this results, where the tree, through its interactions, was able to show the importance of other variables in relation to each other in increasing scores of NA.

Introducing new models to analyze social data is essential for the advancement of science. Moving beyond generalized linear models to represent complex real-life phenomena is crucial, and this study exemplifies how supervised machine learning models can be applied to further our understanding of negative affect. These models capture interactions that may be initially unseen by the researcher, thus pushing the boundaries of what is already known.

The results from the beta regression and the tree are consistent with previous research that has shown that the five domains of maladaptive schemas, particularly impaired autonomy and disconnection and rejection, are closely related to the severity of depressive symptoms [29]. This supports the idea that higher NA is related to more persistent and severe symptoms. Therefore, understanding the presence of cognitive schemas is crucial in comprehending the impact of high levels of NA and the corresponding persistence and severity of depressive symptoms.

This study furthers the understanding of cognitive schemas and NA by examining their interactions with other cognitive schemas. While confirming previous findings, it also adds that in the presence of disconnection and rejection, impaired autonomy, over-vigilance/inhibition, and impaired limits, NA scores will increase. Additionally, when these schemas are combined with risk factors such as alcohol and cigarette use, the scores increase significantly more.

The innovation of this study is twofold: It confirms that using both a generalized linear model and a machine learning model yields coherent and similar results, while also providing further information by revealing interactions. This demonstrates the value of incorporating advanced analytical methods to uncover the complex dynamics of cognitive schemas and their impact on negative affect.

Furthermore, this study emphasizes the significance of cognitive schemas over other factors in an individual’s life, as it demonstrates that the most critical factor in determining a higher score was a cognitive schema. While other variables were included in the model after the cognitive schema, this underscores the importance of therapies that aim to improve the flexibility of one’s interpretation and perception of life to achieve better outcomes and reduce negative feelings associated with NA.

The results of this study suggest that negative attribution and the perception of uncontrollability during stressful situations contribute to higher levels of NA. This finding is consistent with previous research that suggests that attributional style is more a product of current mood than a trait-like mode of thinking that increases susceptibility to clinical depression. This is supported by the stronger association between attributional style and current mood found in previous studies [30]. Furthermore, other studies have linked the interpretation of uncontrollability to the severity of depressive symptoms, indicating that the relationship between the perception of uncontrollability and negative affect may be influenced by the level of trait negative affectivity [31, 32].

Another significant finding of this study is the impact of childhood adversity on the likelihood of higher NA. The results showed that childhood adversity was a significant predictor in both models, and the decision tree analysis revealed that being a victim of childhood adversity in association with disconnection and rejection, as well as impaired autonomy, led to a score of approximately 26 in the IDER questionnaire, indicating moderate to high levels of trait NA. This finding aligns with previous research linking childhood adversity to the enduring experience of negative emotions and recurrent and persistent depressive episodes ( [33]). It highlights the importance of early intervention and support for individuals who have experienced childhood adversity to decrease the negative impact on their emotional well-being, overall mental health and ideally, prevention of mental disorders like MDD.

Finally, the results of the study demonstrate a significant relationship between smoking and alcohol use and NA. This finding is consistent with previous evidence showing strong associations between smoking and mental health disorders characterized by persistent NA [34]. Indeed, a large body of literature supports the positive association between smoking status and neuroticism [34], which is a disposition to experience negative affects such as anger and anxiety [35]. As for alcohol use, the tree-based model revealed that it was a relevant factor at two different splits to predict the IDER score. Previous research has shown that higher levels of NA are specifically linked to the initiation and relapse of alcohol and other substance use disorders [36,37,38,39].

Moreover, the use of two different quantitative methods to explore the relationship between NA and the introduced predictors is an innovative approach to understanding psychological constructs. This method offers evidence of replicability and internal validity, as both methods agree on most of the significant predictors. Moreover, the tree method provides clinicians with a way to identify which factors related to others are protective or riskier, and represents critical trends that contribute to higher NA scores. This can help clinicians make more informed decisions when designing treatment plans to improve outcomes for patients.

There are several limitations to this study that should be considered. First, it is important to note that this study is a secondary data analysis and the data was not collected specifically to answer the research question being investigated. As a result, the external validity of the study may be limited, as the sample only includes a specific population of depressed individuals and their matched healthy controls. Another limitation is that the data was collected cross-sectionally, which means that causality cannot be established between the predictors and the outcome. When prediction is mentioned, it is in the context of predicting data points into classification, like in a two-by-two table. Another limitation is related to the imbalance between genders; more than half of the sample was composed of women. Although depression has been disproportionately reported by women, in fact, almost twice as often as by men [40], it is relevant to interpret these results with caution for the general population, as the majority are women. Finally, a more comprehensive measurement of psychological and medical factors could help identify other predictors of negative affect and better understand the complex interplay between these factors. Future studies on negative affect should consider longitudinal designs that take into account cultural diversity and include measurements of other psychosocial and medical factors. To improve the generalizability of the findings, future research should aim to use a more diverse sample that includes a wider range of individuals and populations, and a balanced sex ratio. Additionally, longitudinal studies would provide more information on the development and persistence of negative affect over time, which could help identify potential intervention points.

Conclusions

In conclusion, our study supports and adds to the previous evidence linking NA to cognitive schemas, attributional style, childhood adversity, and lifestyle behaviors by demonstrating the interaction between these predictors. Our model provides a comprehensive picture of how the combination of these variables can accurately predict higher levels of NA thus contributing to a deeper understanding of the development and maintenance of MDD. The findings support the hypothesis that maladaptive cognitive schemas and negative attributional styles are positively associated with higher levels of trait NA.

The results of this study hold significant practical implications for clinical settings. The use of machine learning techniques, as demonstrated in this study, can offer valuable insights for predicting and making decisions by providing clinicians with pathways that lead to more severe levels of NA and possible poor outcomes for the mental disorders it has been related to. If clinicians are provided with these types of tools that contain heuristics specific to particular populations and problems, it could increase awareness of critical risk factors for the persistence and maintenance of depressive disorders. This research represents a significant advancement in the understanding of NA because it takes into account various aspects of an individual, such as their cognitive schemas and interpretations, lifestyle, and demographics, to show that the interaction of these three factors is crucial in determining high levels of NA and, consequently, more persistent symptoms and greater vulnerability to present depression.

Data availability

The datasets generated and analyzed in the current study are available from the corresponding author on reasonable request.

References

WHO. Depression [Internet]. World Health Organization. World Health Organization. 2021. https://www.who.int/news-room/fact-sheets/detail/depression
Encuesta nacional de salud mental (tomo i). Ministerio de Salud y Protección Social; 2015.
Clark LA, Watson D. Tripartite model of anxiety and depression: psychometric evidence and taxonomic implications. J Abnorm Psychol. 1991;100(3):316–36.
Article PubMed Google Scholar
Jeronimus BF, Kotov R, Riese H, Ormel J. Neuroticism’s prospective association with mental disorders halves after adjustment for baseline symptoms and psychiatric history, but the adjusted association hardly decays with time: a meta-analysis on 59 longitudinal/prospective studies with 443 313 participants. Psychol Med. 2016;46(14).
Watson D, Clark LA. Negative affectivity: the disposition to experience aversive emotional states. Psychol Bull. 1984;96(3):465–90.
Article PubMed Google Scholar
Agudelo Vélez DM, Gómez Maquet Y, López PL. Propiedades psicométricas del inventario de depresión estado rasgo (IDER) con una muestra de población general colombiana. Av en Psicología Latinoam. 2014;32(1):71–84.
Article Google Scholar
Buela-Casal G, Agudelo Vélez DM. IDER inventario de depresión estado-rasgo. S.A.: TEA Ediciones; 2008.
Google Scholar
Ramos J, Sánchez A, Doll A. Personalidad, afecto y estilo de afrontamiento: interacciones en trastorno de personalidad grave. Behav Psychology/Psicología Conductual. 2021;29(3):699–719.
Google Scholar
Spielberger D, Carretero-Dios H, De los Santos Roig M, Gualberto B. Spanish experimental version of the state-trait depression questionnaire (ST-DEP): state sub-scale (s-DEP). Int J Clin Health Psychol. 2002;2.
Stanton K, Watson D. Positive and negative affective dysfunction in psychopathology. Social and Personality Psychology Compass [Internet]. 2014;8(9):555–67. https://compass.onlinelibrary.wiley.com/doi/abs/https://doi.org/10.1111/spc3.12132
Young JE, Klosko JS, Weishaar ME. Schema therapy: a practitioner’s guide. New York: Guilford; 2003.
Google Scholar
Cacioppo J, Berntson G. The affect system: Architecture and operating characteristics. Curr Dir Psychol Sci. 1999;8:133–7.
Article Google Scholar
Vélez DMA, Maquet YG, Castro CU, Africano IM. Relación entre afecto negativo y dominios de esquemas maladaptativos tempranos como variables de Personalidad en El trastorno depresivo mayor: Un estudio de casos y controles. Revista De Psicologı́a:(Universidad De Antioquı́a). 2020;12(2):11–5.
Google Scholar
Luten AG, Ralph JA, Mineka S. Pessimistic attributional style: is it specific to depression versus anxiety versus negative affect? Behav Res Ther. 1997;35(8):703–19.
Article PubMed Google Scholar
Bishop A, Younan R, Low J, Pilkington PD. Early maladaptive schemas and depression in adulthood: A systematic review and meta-analysis. Clinical Psychology & Psychotherapy [Internet]. 2022;29(1):111–30. https://onlinelibrary.wiley.com/doi/abs/10.1002/cpp.2630
Jiang T, Gradus JL, Rosellini AJ. Supervised machine learning: A brief primer. Behavior Therapy [Internet]. 2020;51(5):675–87. https://www.sciencedirect.com/science/article/pii/S0005789420300678
Neumann M, Klatt T. Identifying predictors of inpatient verbal aggression in a forensic psychiatric setting using a tree-based modeling approach. J Interpers Violence. 2021;37:1–26.
Google Scholar
Latigg M, Villamizar V, Gaviria A, Ángel J, Cañizares C, Ferro E, et al. BDNF methylation and stress response in a clinical population with major depressive disorder. Biol Psychiatry. 2017;81:S96.
Article Google Scholar
American Psychiatric Association. Diagnostic and statistical manual of mental disorders: DSM-IV-TR. 4th ed., text rev. Washington, DC: Autor; 2000.
Ferrando L, Bobes J, Giber J, Soto M, Soto O. M.i.n.i. Mini international neuropsychiatric interview. Versión en español 5.0.0. DSM-IV. 2000;2023(03). https://www.fundacionforo.com/pdfs/mini.pdf
Sandín B, Chorot P. Cuestionario de sucesos vitales (CSV): Estructura factorial, características psicométricas y datos normativos. Revista de Psicopatología y Psicología Clínica [Internet]. 2017;22(2):95–115. https://revistas.uned.es/index.php/RPPC/article/view/19729
E YJ. Young schema questionnaire–short form (YSQ-SF, YSQ-s, YSQ). APA PsycTests [Internet]. 1998; https://psycnet.apa.org/doiLanding?doi=10.1037/t12644-000
Londoño N, Schnitter M, Marín C, Calvete E, Ferrer A, Maestre K, et al. Young schema questionnaire-short form: Validación en colombia. Universitas Physiol. 2012;11:147–64.
Google Scholar
Spielberger CD, Buela-Casal G, Agudelo D. IDER: Inventario De depresión estado-rasgo. TEA; 2008.
Danhauer SC, Legault C, Bandos H, Kidwell K, Costantino J, Vaughan L et al. Positive and negative affect, depression, and cognitive processes in the cognition in the study of tamoxifen and raloxifene (co-STAR) trial. Aging, Neuropsychology, and Cognition [Internet]. 2013;20(5):532–52. https://doi.org/10.1080/13825585.2012.747671
Smithson M, Verkuilen J. A better lemon squeezer? Maximum-likelihood regression with beta-distributed dependent variables. Psychol Methods. 2006;11:54–71.
Article PubMed Google Scholar
James G, Witten D, Hastie T, Tibshirani R. An introduction to statistical learning: With applications in r [Internet]. Springer; 2013. https://faculty.marshall.usc.edu/gareth-james/ISL/
Hemmert GAJ, Schons LM, Wieseke J, Schimmelpfennig H. Log-likelihood-based pseudo-R2 in logistic regression: Deriving sample-sensitive benchmarks. Sociological Methods & Research [Internet]. 2018;47(3):507–31. https://doi.org/10.1177/0049124116638107
Chen KH, Tam CWC, Chang K. Early maladaptive schemas, depression severity, and risk factors for persistent depressive disorder: a cross-sectional study. East Asian Arch Psychiatry. 2019;29(4):112–7.
Article PubMed Google Scholar
Ball HA, McGuffin P, Farmer AE. Attributional style and depression. Br J Psychiatry. 2008;192(4):275–8.
Article PubMed Google Scholar
Usuga Jerez AJ, Lemos Ramírez NV, Pinzón Ardila JL, Pérez Rivero PF, Uribe Rodríguez AF. Sucesos vitales estresantes, ansiedad y depresión en estudiantes de una universidad privada de bucaramanga. Informes Psicológicos [Internet]. 2021;21(2):61–74. https://revistas.upb.edu.co/index.php/informespsicologicos/article/view/7204
Camuñas N, Mavrou E, Miguel Tobal JJ. Ansiedad y tristeza-depresión: Una aproximación desde la teoría de la indefensión-desesperanza. Revista de Psicopatología y Psicología Clínica [Internet]. 2019;24(1). https://revistas.uned.es/index.php/RPPC/article/view/23003
Nanni V, Uher R, Danese A. Childhood maltreatment predicts unfavorable course of illness and treatment outcome in depression: a meta-analysis. Am J Psychiatry. 2012;169(2):141–51.
Article PubMed Google Scholar
Kassel JD, Stroud LR, Paronis CA. Smoking, stress, and negative affect: correlation, causation, and context across stages of smoking. Psychol Bull. 2003;129(2):270–304.
Article PubMed Google Scholar
Widiger TA, Oltmanns JR. Neuroticism is a fundamental domain of personality with enormous public health implications. World Psychiatry. 2017;16(2):144–5.
Article PubMed PubMed Central Google Scholar
Stasiewicz PR, Maisto SA. Two-factor avoidance theory: The role of negative affect in the maintenance of substance use and substance use disorder. Behavior Therapy [Internet]. 1993;24(3):337–56. https://www.sciencedirect.com/science/article/pii/S0005789405802102
Cooney NL, Litt MD, Morse PA, Bauer LO, Gaupp L. Alcohol cue reactivity, negative-mood reactivity, and relapse in treated alcoholic men. J Abnorm Psychol. 1997;106(2):243–50.
Article PubMed Google Scholar
Shoal GD, Giancola PR. Cognition, negative affectivity and substance use in adolescent boys with and without a family history of a substance use disorder. Journal of Studies on Alcohol [Internet]. 2001;62(5):675–86. https://doi.org/10.15288/jsa.2001.62.675
Measelle JR, Stice E, Springer DW. A prospective test of the negative affect model of substance abuse: moderating effects of social support. Psychol Addict Behav. 2006;20(3):225–33.
Article PubMed PubMed Central Google Scholar
Shi P, Yang A, Zhao Q, Chen Z, Ren X, Dai Q. A hypothesis of gender differences in self-reporting symptom of depression: implications to solve under-diagnosis and under-treatment of depression in males. Front Psychiatry. 2021;12:589687.
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We would like to express our gratitude to Minciencias (712.2015/908 /120471250970), to the Universidad of Los Andes, our assistants, and the psychiatry Institutions. Finally, this research would not have been possible without the voluntary cooperation of all the patients and people that participated in this study.

Funding

This study was financed by a research grant from the Colombian Ministry of Science Minciencias (712.2015/908 /120471250970) and the Facultad de Ciencias – Universidad de los Andes (INV-2019-84-1851) Neither the design, the collection, the analysis and interpretation of data nor the manuscript and submission of the article partook of the participation of the financing entity.

Author information

Authors and Affiliations

Florida International University, 11200 SW 8th St, Miami, FL, 33199, USA
Catalina Cañizares & Gabriel Odom
Universidad de los Andes, Cra. 1 #18a-12, La Candelaria, Bogotá, Cundinamarca, Colombia
Yvonne Gómez-Maquet, Carlos Arturo Torres & Diana María Agudelo
Instituto Colombiano del Sistema Nervioso Clínica Montserrat, Cl. 134 #17, Usaquén, Bogotá, Colombia
Eugenio Ferro

Authors

Catalina Cañizares
View author publications
Search author on:PubMed Google Scholar
Yvonne Gómez-Maquet
View author publications
Search author on:PubMed Google Scholar
Eugenio Ferro
View author publications
Search author on:PubMed Google Scholar
Carlos Arturo Torres
View author publications
Search author on:PubMed Google Scholar
Diana María Agudelo
View author publications
Search author on:PubMed Google Scholar
Gabriel Odom
View author publications
Search author on:PubMed Google Scholar

Contributions

CC and YG: Wrote the first and last draft of the manuscript and all authors contributed to and have approved the final manuscript. CC: Designed the study and conducted the statistical analysis. YG: Supervised, reviewed and edited the manuscript. CAT: Conducted literature searches and provided summaries of previous research studies. EF, DMA: Reviewed and edited the manuscript. GO: Guided the application of statistical, mathematical, computational, and other formal techniques to analyze and synthesize the data.

Corresponding author

Correspondence to Catalina Cañizares.

Ethics declarations

Ethics approval and consent to participate

This study was approved by the Ethics Committee from Los Andes University, the Ethics Committee from Instituto Colombiano del Sistema Nervioso Clínica Montserrat, the Ethics Committee from Clínica la Inmaculada, and the CEI Campo Abierto Comité de Ética en Investigación.

All methods were carried out adhering to the Declaration of Helsinki. Strict adherence was maintained to the ethical principles of respect for persons, beneficence, and justice. Respect for persons involved ensuring participants’ autonomy and informed consent. The original data collection was conducted with written informed consent obtained from each participant, who voluntarily joined the study without receiving any financial compensation. Beneficence focused on maximizing possible benefits and minimizing possible harms. Justice was ensured by providing clear explanations of the research, conducting interviews with advanced trained psychologists, and maintaining the confidentiality of participants’ information. For this secondary data analysis, respect for persons was upheld by anonymizing all data to protect participant confidentiality and privacy.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Cañizares, C., Gómez-Maquet, Y., Ferro, E. et al. Using tree-based models to identify factors contributing to trait negative affect in adults. BMC Psychol 13, 92 (2025). https://doi.org/10.1186/s40359-024-02245-z

Download citation

Received: 24 May 2023
Accepted: 02 December 2024
Published: 01 February 2025
DOI: https://doi.org/10.1186/s40359-024-02245-z

Using tree-based models to identify factors contributing to trait negative affect in adults

Abstract

Background

Objective

Methods

Results

Conclusions

Background

Methods

Participants

Measures

Procedure

Data analysis

Results

Sample characteristics

Beta regression

Regression tree results

Discussion

Conclusions

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s note

Electronic supplementary material

Supplementary Material 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Psychology

Contact us