Project Dataset¶

Synthetic Longitudinal Arabic Mental Health Dataset¶

This project utilizes a custom-designed synthetic dataset built specifically to support longitudinal modeling of depression and anxiety severity from Arabic text.

The dataset enables trajectory analysis rather than static classification.

Dataset Overview¶

Participants

100 virtual participants
25 time-stamped text entries per participant
Total Entries

2,500 Arabic text samples representing varying mental health states over time.
Severity Labels

Two independent labels per entry: - Depression Score (0 to 3)
- Anxiety Score (0 to 3)
Longitudinal Structure

Each entry includes a participant ID and date field to simulate mental health progression.

Labeling Framework¶

Severity levels follow a four-level structure:

0 → None
1 → Mild
2 → Moderate
3 → Severe

The dataset design was inspired by standardized instruments such as PHQ-9 and GAD-7, while remaining fully synthetic to preserve privacy.

Dataset Fields¶

Column Name	Description
Participant_ID	Unique identifier for each virtual participant
Date	Timestamp representing longitudinal progression
Arabic_Text	Arabic narrative entry
Depression_Score	Severity label from 0 to 3
Anxiety_Score	Severity label from 0 to 3

Ethical Considerations¶

Synthetic Data

All entries are AI-generated and manually curated for academic research purposes.
Privacy Preservation

No personal data, real patient records, or identifiable information is included.
Research Use Only

The dataset is intended for experimentation, evaluation, and methodological research.

Access¶

The dataset is available for academic review and experimentation within the scope of this project.

Download Dataset