Skip to content

Evaluation Methodology

Quantitative and Class-Level Assessment Framework

This section describes the evaluation protocol used to assess the predictive performance and robustness of the dual-model framework for depression and anxiety severity classification.

The evaluation emphasizes both aggregate performance metrics and detailed class-level behavior.


Evaluation Objectives

The evaluation is designed to:

  • Quantify classification performance across all severity levels
  • Assess robustness for higher-severity cases
  • Examine ordinal boundary confusion
  • Compare performance between depression and anxiety models
  • Ensure reproducibility and transparent reporting

Train–Test Partitioning

The dataset was divided using stratified random sampling to preserve severity-level proportions:

  • Training set: 1,750 entries (70 percent)
  • Testing set: 750 entries (30 percent)

Stratification ensures that none, mild, moderate, and severe categories are proportionally represented in both subsets, reducing class imbalance bias.

All reported metrics are computed exclusively on the held-out test set.


Evaluation Metrics

Performance was assessed using standard multi-class classification metrics:

  • Accuracy: Overall proportion of correct predictions
  • Precision (per class): True positives divided by predicted positives
  • Recall (per class): True positives divided by actual positives
  • F1-score (per class): Harmonic mean of precision and recall

To provide balanced evaluation:

  • Macro-average metrics treat all classes equally
  • Weighted-average metrics account for class support

Both reporting styles are included to ensure fair interpretation across severity levels.


Class-Level Performance Analysis

Given the ordinal nature of severity levels, analysis focused on:

  • Confusion between adjacent categories such as moderate and severe
  • Stability in identifying higher-risk classes
  • Rare extreme misclassifications such as severe predicted as none

Because mental health severity exists on a continuum, minor overlap between neighboring classes is expected and does not necessarily indicate instability.


Confusion Matrix Examination

Confusion matrices were analyzed to visualize prediction distributions.

Key interpretation criteria:

  • Strong diagonal dominance indicates reliable class separation
  • Off-diagonal concentration near adjacent classes reflects ordinal boundary overlap
  • Minimal severe-to-none misclassification supports high-risk detection reliability

ROC and AUC Evaluation

Receiver Operating Characteristic curves were computed using a one-vs-rest strategy for each severity class.

The Area Under the Curve (AUC) measures discriminative ability independent of a fixed decision threshold.

AUC analysis provides:

  • Class-wise separability insight
  • Comparative difficulty between depression and anxiety models
  • Stability across varying classification thresholds

Dual-Model Comparison

Separate SVM classifiers were trained for:

  • Depression severity
  • Anxiety severity

Comparative analysis evaluates:

  • Relative performance across equivalent severity classes
  • Sensitivity to higher-severity cases
  • Differences in boundary ambiguity

Observed performance differences are discussed in the Results section.


Reproducibility Considerations

  • Stratified sampling ensures consistent severity distribution
  • Hyperparameters are tuned to optimize generalization
  • Models are serialized for consistent reuse
  • Evaluation is performed on a fixed held-out test set

These measures support replicability and transparent review.


Evaluation Limitations

The evaluation protocol is subject to constraints:

  • Labels are simulated rather than clinically verified
  • Predictions are generated at the entry level without sequence modeling
  • Synthetic data may produce clearer separability than real-world narratives

As a result, reported performance reflects controlled experimental conditions rather than clinical validation.

Further discussion appears in the Limitations and Future Work sections.