FoodHash: Context-Aware Proxy Interaction and Fusion for Food Image Retrieval

Pindan Cao, Weiqing Min, Guorui Sheng, Yongqiang Song, Tao Yao, Lili Wang, Shuqiang Jiang

二月 2026

摘要

Vision-based food image retrieval has garnered significant attention due to its potential for critical applications in dietary and health management. However, food images exhibit more complex feature distributions and lack the geometric regularity and structured patterns typically observed in general image retrieval tasks. This complexity poses a challenge for existing models to extract fine-grained features and semantic information, thereby compromising retrieval performance. To address this challenge, we propose FoodHash, a context-aware proxy interaction and fusion hashing method for food image retrieval. The method incorporates an Aggregation-Interaction-Propagation (AIP) module that facilitates contextual information exchange among patch tokens within the same feature map, guided by proxy tokens, thereby effectively capturing the intricate details of food images. Furthermore, to leverage the rich semantic information in food images, a Cross-Fusion Module is introduced to efficiently integrate multi-scale information and enhance feature representation. Additionally, we employ a novel loss function to optimize hash learning by ensuring consistency between hash codes and the semantic space, thereby enhancing the learning capability of hash coding. Extensive experiments on three publicly available food datasets demonstrate that FoodHash significantly surpasses existing models in retrieval performance. Specifically, on the ETH Food-101 dataset, FoodHash achieves improvements of 18.1%, 6.7%, 5.2% and 4.5% over the suboptimal method PTLCH for 16-bits, 32-bits, 48-bits and 64-bits hash codes, respectively. The source code will be made publicly available upon publication of the paper.

类型

期刊文章

出版物

ACM Trans. Multimedia Comput. Commun. Appl.