Frontiers in Physiology 2023 Published

Development and validation of a deep learning-based model to distinguish acetabular fractures on pelvic anteroposterior radiographs

Ye P, Li S, Wang Z, et al.

A neural network read pelvic X-rays and caught acetabular fractures that trained clinicians missed — and held its accuracy on films from two hospitals it had never seen.

0.926 model sensitivity vs 0.750 across 10 clinicians
0.988 specificity, external set held off the training data
26 of 46 hard cases recovered missed by 5+ human readers
1,206 radiographs CT- and surgeon-confirmed

What the study found

Acetabular fractures — breaks in the hip socket — are easy to miss on a plain pelvic X-ray, because the femoral head overlaps the socket and the fragment can be small; clinicians miss up to one in five. The authors trained a DenseNet-169 model on more than a thousand radiographs, each confirmed by CT and surgeon review, to flag whether a fracture is present and sort it by severity. Tested against ten clinicians, it was right far more often — about 93% sensitivity against their 75% — and kept that edge on X-rays from two outside hospitals. On the hardest cases, the ones most human readers missed, it still recovered more than half. Heatmaps confirmed it was looking at the actual break, not an artifact. A model like this could serve as a second reader in the emergency department.

Key findings

On internal testing the model reached sensitivity 0.926, specificity 0.978, and accuracy 0.952; on external validation across two outside hospitals, sensitivity 0.872, specificity 0.988, accuracy 0.930 — it held up off the training distribution.
Across ten clinicians, mean sensitivity was 0.750 internally and 0.735 externally, versus the model’s 0.926 and 0.872 — it caught roughly one in five fractures the average clinician missed.
Accuracy climbed with fracture complexity: per-type AUC reached 0.963 (Type A), 0.991 (Type B), and 1.000 (Type C) on internal testing.
Of 46 hard cases missed by five or more of the ten clinicians, the model correctly flagged 26 — direct evidence it recovers diagnoses humans overlook.
Trained and validated on 1,206 patients — 1,120 from a single trauma center (2013–2021) plus 86 from two independent hospitals — with surgeon consensus and CT/MRI as ground truth.
Grad-CAM activation maps localized the true fracture region for every positive call, rather than spurious image features.

Design: Retrospective; single-center development with two-hospital external validation
Model: DenseNet-169 convolutional neural network
Data: 1,206 patients (1,120 development and test, 2013–2021; 86 external)
Reference standard: Surgeon consensus with CT/MRI confirmation
Comparison: 10 clinicians (attendings and residents)

Figure slot — drop /assets/publications/acetabular-fracture-dl-detection.png

ROC curves place the model’s accuracy directly against where the ten clinicians land — its curve sits above every human operating point, on both the internal and the external set. Figure available CC BY from Ye et al., Front. Physiol. 2023.

Read the abstract

This study developed and tested a deep learning (DL) model to distinguish acetabular fractures (AFs) on pelvic anteroposterior radiographs (PARs) and compared its performance with that of clinicians. 1,120 patients from a single trauma center were enrolled and split 3:1 into development and internal test sets, with a further 86 patients from two independent hospitals serving as an external validation cohort. All acetabular fractures were classified into types A, B, and C using three-column classification theory, with surgeon consensus and CT/MRI confirmation as the reference standard. A DenseNet-169 convolutional neural network was trained on preprocessed radiographs. On the internal test set the model achieved a sensitivity of 0.926, specificity of 0.978, and accuracy of 0.952; on external validation it achieved sensitivity 0.872, specificity 0.988, and accuracy 0.930. Across ten clinicians the mean sensitivity/specificity/accuracy was 0.750/0.909/0.829 (internal) and 0.735/0.909/0.822 (external) — the model outperformed clinicians on every metric, with the largest gain in sensitivity. Performance rose with fracture complexity (Type C AUC = 1.000). Among 46 cases missed by five or more clinicians, the model correctly identified 26 (56.5%). Grad-CAM heatmaps localized the regions driving each prediction. The authors conclude that a DL model can detect and triage acetabular fractures on plain radiographs at a level competitive with — and on sensitivity superior to — trauma clinicians, with potential to reduce missed diagnoses in the emergency setting.

Cite

Ye P, Li S, Wang Z, et al. Development and validation of a deep learning-based model to distinguish acetabular fractures on pelvic anteroposterior radiographs. Frontiers in Physiology 2023

DOI 10.3389/fphys.2023.1146910

Read the full paper →