Abstract

We propose to extend the concept of private information retrieval by allowing for distortion in the retrieval process and relaxing the perfect privacy requirement at the same time. In particular, we study the tradeoff between download rate, distortion, and user privacy leakage, and show that in the limit of large file sizes this tradeoff can be captured via a novel information-theoretical formulation for datasets with a known distribution. Moreover, for scenarios where the statistics of the dataset is unknown, we propose a new deep learning framework by leveraging a generative adversarial network approach, which allows the user to learn efficient schemes from the data itself, minimizing the download cost. We evaluate the performance of the scheme on a synthetic Gaussian dataset as well as on the MNIST, CIFAR-\(10\), and LSUN datasets. For the MNIST, CIFAR-\(10\), and LSUN datasets, the data-driven approach significantly outperforms a nonlearning-based scheme which combines source coding with multiple file download.