PAPER_TITLE

FIRST_AUTHOR_LAST, FIRST_AUTHOR_FIRST; SECOND_AUTHOR_LAST, SECOND_AUTHOR_FIRST

TICON: A Slide-Level Tile Contextualizer for Histopathology Representation Learning

Varun Belagali^1*, Saarthak Kapse^1*, Pierre Marza^2†, Srijan Das^3†, Zilinghan Li⁴, Sofiène Boutaj², Pushpak Pati⁷, Srikar Yellapragada¹, Tarak Nath Nandi⁴, Ravi K Madduri^4,5, Joel Saltz¹, Prateek Prasanna¹, Stergios Christodoulidis², Maria Vakalopoulou^2,6, Dimitris Samaras¹

¹Stony Brook University ²MICS, CentraleSupélec, Université Paris-Saclay ³UNC Charlotte ⁴Argonne National Laboratory ⁵University of Chicago ⁶Archimedes/Athena RC ⁷Independent Researcher
^* Co-first and ^† Co-second Authors

arXiv Code Paper Slides

TICON: An Omni Tile Contextualizer that can contextualize embeddings from any tile encoder. (—) represent input projectors for tile encoders used in pretraining. (- -) represent input projectors used in adapting TICON to new tile encoders.

Abstract

The interpretation of small tiles in large whole slide images (WSI) often needs a larger image context. We introduce TICON, a transformer-based tile representation contextualizer that produces rich, contextualized embeddings for ``any'' application in computational pathology. Standard tile encoder-based pipelines, which extract embeddings of tiles stripped from their context, fail to model the rich slide-level information essential for both local and global tasks. Furthermore, different tile-encoders excel at different downstream tasks. Therefore, a unified model is needed to contextualize embeddings derived from ``any'' tile-level foundation model. TICON addresses this need with a single, shared encoder, pretrained using a masked modeling objective to simultaneously unify and contextualize representations from diverse tile-level pathology foundation models. Our experiments demonstrate that TICON-contextualized embeddings significantly improve performance across many different tasks, establishing new state-of-the-art results on tile-level benchmarks (i.e., HEST-Bench, THUNDER, CATCH) and slide-level benchmarks (i.e., Patho-Bench). Finally, we pretrain an aggregator on TICON to form a slide-level foundation model, using only 11K WSIs, outperforming SoTA slide-level foundation models pretrained with up to 350K WSIs.