Skip to content

Project Dataset

Synthetic Longitudinal Arabic Mental Health Dataset

This project utilizes a custom-designed synthetic dataset built specifically to support longitudinal modeling of depression and anxiety severity from Arabic text.

The dataset enables trajectory analysis rather than static classification.


Dataset Overview

  • Participants


    100 virtual participants
    25 time-stamped text entries per participant

  • Total Entries


    2,500 Arabic text samples representing varying mental health states over time.

  • Severity Labels


    Two independent labels per entry: - Depression Score (0 to 3)
    - Anxiety Score (0 to 3)

  • Longitudinal Structure


    Each entry includes a participant ID and date field to simulate mental health progression.


Labeling Framework

Severity levels follow a four-level structure:

  • 0 → None
  • 1 → Mild
  • 2 → Moderate
  • 3 → Severe

The dataset design was inspired by standardized instruments such as PHQ-9 and GAD-7, while remaining fully synthetic to preserve privacy.


Dataset Fields

Column Name Description
Participant_ID Unique identifier for each virtual participant
Date Timestamp representing longitudinal progression
Arabic_Text Arabic narrative entry
Depression_Score Severity label from 0 to 3
Anxiety_Score Severity label from 0 to 3

Ethical Considerations

  • Synthetic Data


    All entries are AI-generated and manually curated for academic research purposes.

  • Privacy Preservation


    No personal data, real patient records, or identifiable information is included.

  • Research Use Only


    The dataset is intended for experimentation, evaluation, and methodological research.


Access

The dataset is available for academic review and experimentation within the scope of this project.

Download Dataset