Updates:

Overview

Welcome to my website! My name is Joe Feldman and I am a postdoctoral associate in the Department of Statistical Science at Duke University where I am supervised by Jerry Reiter. I completed my Ph.D. in May of 2023 at Rice University in the Department of Statistics, and my adviser was Dan Kowal.

I am interested in developing Bayesian methodology for complex and high-dimensional data sets, with applications into public health, sociology, and economics. Broadly, my research has three themes:

  1. Analysis of Missing Data: Missing values are commonplace in modern data sets, especially when they are built by linking information across multiple sources. We are broadly interested in developing flexible, Bayesian joint models for imputation of missing values which are compatible with mixed data types and potentially nonignorable missing values.

  2. Data Privacy: We aim to make information obtained from confidential data sources accessible to the broader public without jeopardizing privacy. One way to accomplish this is through data synthesis, which learns a probabilistic, generative model on confidential data, and simulates from that model to create synthetic data. Ideally, the model is flexible to capture univariate and multivariate features of the data, while the synthetic data maintains no correspondence with observations in the original data set

  3. Interpretable Machine Learning: We leverage Bayesian decision theory to provide simple, “near optimal” summaries of black-box predictive models. We can also use these techniques to perform high-dimensional variable selection, which enables inference into the relationships between covariates and response.

New Paper Spotlight! Differentially Private Bayesian Inference for Gaussian Copula Correlations

Deep generative models offer powerful tools for multivariate data analysis, but their black-box architectures are often unidentified and difficult to interpret. We introduce the Deep Discrete Encoder (DDE) Copula, an identifiable and interpretable generative model for multivariate data with arbitrary marginal distributions. The model places a hierarchical directed network of binary latent variables inside a copula framework, enabling flexible dependence modeling for mixed discrete and continuous data. Estimation is based on rank likelihoods, which decouple marginal modeling from posterior inference on the DDE parameters and avoid specifying the marginal distributions. We establish conditions for identification of the DDE copula parameters, ensuring that layer-specific parameters provide meaningful summaries of multivariate dependence. We also prove quotient-space posterior consistency for continuous margins under the exact rank likelihood and treat the extended rank likelihood for tied or mixed margins as a generalized likelihood, with concentration under an additional contrast condition. For computation, we propose a stochastic expectation-maximization algorithm for \emph{maximum a posteriori} estimation, together with initialization strategies that improve convergence. To learn network dimension adaptively, we extend Bayesian rank-selection priors to infer layer-specific widths. Simulations show strong finite-sample performance, and a personality-survey analysis reveals interpretable hierarchical latent structure in complex multivariate data.