| dc.description.abstract |
Timely identification of urgent patient messages is critical for effective clinical
decision-making in mobile health (mHealth) programs, particularly in low-resource
settings where healthcare workers manage large volumes of incoming short message
service (SMS) communication. In Kenyan maternal and child health programs,
nurses manually triage multilingual patient messages, a process that contributes to
delayed responses and increased risk of missed urgent cases. This study investigates
the effectiveness of contextual natural language processing (NLP) models for
automatically classifying patient SMS messages into urgency categories within a
real-world mHealth environment. Using a dataset of 11,129 manually labelled
multilingual SMS messages from 772 participants enrolled in the Mobile Solutions
for Women and Children’s Health (Mobile WACh NEO) program in Kenya, urgency
detection was formulated as a supervised binary classification task aligned with
clinical triage workflows. Baseline models employing unigram and bigram features
with penalized logistic regression were compared against contextual embedding
approaches, including multilingual BERT (mBERT), SwahBERT, and AfriBERT.
Transformer models were adapted to the clinical domain through domain-specific
pretraining and task-adaptive fine-tuning. To mitigate contextual sparsity inherent in
short SMS messages, prior nurse or system messages were concatenated with the
current message to form context-aware input representations. Model development
followed explicit train, development, and test splits, with cross-validation applied
during training to support robust model selection and reduce overfitting. Performance
was evaluated using precision, recall, and F1-score, emphasizing clinical utility for
both triage and prioritization objectives. Transformer architectures substantially
outperformed frequency-based baselines, achieving F1 improvements of up to 0.186
relative to bigram models. Our best performing model was mBERT model pretrained
on task-level adaptation using nurse context before fine-tuning. This model got a
precision of 50%, recall of 45% and F1 score of 47%, which were below the
thresholds we set for either a triage model or prioritization model. However,
incorporating nurse conversational context reduced performance gaps between
configurations (e.g., ΔF1 decreasing from approximately 0.080 in non-contextual
mBERT to 0.032 with nurse context), while task-adaptive pretraining provided
incremental yet consistent gains. Although performance did not fully meet
predefined clinical usefulness thresholds, context-aware fine-tuned transformer
models demonstrated improved recall, indicating reduced risk of missed urgent
messages. Overall, the findings confirm that contextual transformer-based models
offer meaningful advantages over traditional representations in multilingual, low
resource clinical SMS environments. While additional advances in architecture and
domain adaptation are needed to reach optimal deployment standards, the results
align with contemporary state-of-the-art NLP practices and support the feasibility of
automated decision-support tools to augment nurse triage workflows in mHealth
systems.
Keywords: Urgency Detection, Contextual NLP, Multilingual Transformers,
mHealth, Clinical Decision Support |
en_US |