Technology

Technical approach of the PROSurvival project

The PROSurvival approach combines federated learning and predictive pattern recognition to enable efficient multi-site deep learning training. We combine the central mining of whole-slide-imaging (WSI) data with a privacy-preserving federate correlation to associated clinical data. This clinical data constructs ideal clinical endpoints with high relevance for patient treatment and prognosis. To correlate WSIs and associated data without compromising patients’ privacy, PROSurvival will implement a privacy-preserving federated learning infrastructure based on existing frameworks. We will perform the training of the federated learning pipeline in two steps:

  • Step 1, extract predictive patterns: The WSI data, which is less sensitive from a data protection perspective, will be centrally analyzed and condensed to clinically predictive pattern information using approaches from neural image compression and multi-task learning. By extracting patterns across multiple datasets, a novel abstract representation is learned that is less dependent on the particularities (such as staining or preparation bias) of a single clinical site. The extraction of predictive patterns reduces the amount of data to be transferred and facilitates analysis on off-the-shelf hardware at the clinical sites.
  • Step 2, data linkage and federated training: At the clinical sites, WSI data and/or condensed predictive patterns will be linked to the associated data. In a federated training loop, a joint deep learning model will be computed. A central server will orchestrate updates and aggregation. Techniques from differential privacy will be used to reduce the risk of data leakage during training.

We will focus on obtaining results from clinical data and keep the infrastructure as lean as possible while maintaining the integrity and privacy of clinical data. We will base the implementation on open-source federated learning methods and on components for data and AI model management from the EMPAIA project.

The main result of the data analysis in PROSurvival will be a multi-site trained model that can easily be extended to additional sites.

The image data will be provided in a format in line with established standards, such as DICOM and along the guidelines of the MI-I and relevant NFDI consortia.