Data Privacy Applications with Monte-Carlo Methods
Publication link: https://doi.org/10.1007/s11222-022-10129-8
Pre-print: https://arxiv.org/abs/2203.13377
Code : https://github.com/barisalparslan1/statistic_selection_and_mcmc
Keywords: Differential privacy, Markov chain Monte Carlo, Fisher information, statistic selection, Bayesian statistics
This work mainly addresses two problems: (1) What statistic of the sample should be shared privately? For the first question, i.e., the one about statistic selection, we promote using the Fisher information. We find out that, the statistic that is most informative in a non-privacy setting may not be the optimal choice under the privacy restrictions. We provide several examples to support that point. We consider several types of data sharing settings and propose several Monte Carlo-based numerical estimation methods for calculating the Fisher information for those settings. The second question concerns inference: (2) Based on the shared statistics, how could we perform effective Bayesian inference? We propose several Markov chain Monte Carlo (MCMC) algorithms for sampling from the posterior distribution of the parameter given the noisy statistic.

| Model | Type |
|---|---|
| (1) | Additive statistic, Gaussian noise |
| (2) | Additive statistic, Non-Gaussian noise (Laplace) |
| (3) | Non-additive statistic, Non-Gaussian noise (Laplace) |
| (4) | No summary statistic instead sequential release, Non-Gaussian noise (Laplace) |
MCMC algorithms used in the project:
| Model | Algorithm |
|---|---|
| Additive statistic, Gaussian noise | MH (Algorithm 4) |
| Additive statistic, Non-Gaussian noise (Laplace) | PMMH (Algorithm 5), MHAAR (Algorithm 6 |
| Non-additive statistic, Non-Gaussian noise (Laplace) | MHAAR (Algorithm 7) |
| No summary statistic instead sequential release, Non-Gaussian noise (Laplace) | MHAAR (Algorithm 8) |
Pre-print: https://arxiv.org/abs/2301.08202
Code: https://github.com/sinanyildirim/SMC_DP_adaTr
Keywords: Differential privacy, Bayesian statistics, Sequential Monte Carlo, online learning
We propose a novel online and adaptive truncation method for differentially private Bayesian online estimation of a static parameter regarding a population. We assume that sensitive information from individuals is collected sequentially and the inferential aim is to estimate, on-the-fly, a static parameter regarding the population to which those individuals belong. We propose sequential Monte Carlo to perform online Bayesian estimation. The idea is that, based on previous observations, we can carefully arrange the interval into which the next individual’s information is to be truncated before being perturbed with privacy-preserving noise. In this way, we aim to design predictive queries with small sensitivity, hence small privacy-preserving noise, enabling more accurate estimation while maintaining the same level of privacy.
Initialise the estimation system $\Theta_{0}$ and $s_{1}(\cdot)$.
For $t = 1, 2, \ldots$
The method employs Sequential Monte Carlo algorithm for step (3)
The method utilizes adaptive truncation based on exploration-exploitation and Thomson sampling using Fisher information for step (4)
Pre-print: https://arxiv.org/abs/2301.13778
Code : https://github.com/sinanyildirim/Bayesian_DP_dist_LR
Keywords: Differential privacy, linear regression, distributed learning, MCMC
We propose a novel Bayesian inference framework for distributed differentially private linear regression. We consider a distributed setting where multiple parties hold parts of the data and share certain summary statistics of their portions in privacy-preserving noise. We develop a novel generative statistical model for privately shared statistics, which exploits a useful distributional relation between the summary statistics of linear regression. Bayesian estimation of the regression coefficients is conducted mainly using Markov chain Monte Carlo algorithms, while we also provide a fast version to perform Bayesian estimation in one iteration. The proposed methods have computational advantages over their competitors which are state-of-art algorithms adopted for the distributed case.
Model: \[ y_{i} = x_{i}^{T}\theta + e_{i}, \quad e_{i} \overset{\text{i.i.d.}}{\sim} \mathcal{N}(0, \sigma_{y}^{2}), \quad i = 1, \ldots, n, \]
Summary statistics: \[ S := X^{T}X, \quad z := X^{T}y, % = S\theta + X^{T} e. \] Sharing summary statistics with noise: \[ \hat{S} = S + \sigma_{s}M, \] \[ \hat{z} = z + \sigma_{z} v, \quad v \sim \mathcal{N}(0,I_{d}). \]
$X_{j}^{T}X_{j}$, $z_{j} = X_{j}^{T}y_{j}$
\[ S_{\text{noise},j} = S_{j} + \sigma_{s}M_{j} \]
where $M$ is a d × d symmetric matrix with its upper triangular elements drawn from $\mathcal{N}(0, 1)$.
\[ z_{\text{noise},j} = z + \sigma_{z}v_{j}, \quad v_{j} \sim \mathcal{N}(0,I_{d}) \]
| Model | Algorithm |
|---|---|
| Normally distributed features | MCMC-NormalX |
| Features with general distribution | MCMC-fixedS, Bayes-fixedS-fast, adaSSP, MCMC-B&S |
| Extended state-of-art algorithms | MCMC-B&S, adaSSP |
| Fast (one-iteration) algorithms | Bayes-fixedS-fast,adaSSP |