Publications (# indicates corresponding author)

Zhang, H. and Wang, H. (2026). Refitted cross-validation estimation for high-dimensional subsamples from low-dimension full data. Computational Statistics, 41, 1-15.
Zhang, H., Zheng, Y., Hou, L. and Liu, L. (2025). HIMA: An R package for high-dimensional mediation analysis. Journal of Data Science, DOI: 10.6339/25-JDS1192. [R package “HIMA”].
Liu, L., Zhang, H., Zhang, K., Zheng, Y., Gao, T., Zheng, C., Hou, L., Liu, L. (2025). High-dimensional mediation analysis for longitudinal mediators and survival outcomes. Briefings in Bioinformatics, 26, 1-11
Bai, X. and Zhang, H.# (2025). An online updating approach for estimating and testing mediation effects with big data streams. Statistics and Computing, 35, 1-17.(The first author is a Master student under my supervision).
Zhang, H., Li, Y. and Wang, H. (2025). DsubCox: A fast subsampling algorithm for Cox model with distributed and massive survival data. International Journal of Biostatistics, 21, 53-65.
Zhang, H. (2025). Efficient adjusted joint significance test and Sobel-type confidence interval for mediation effect. Structural Equation Modeling: A Multidisciplinary Journal, 32, 93-104. [R package “AdjMed”].
Bai,X., Zheng, Y., Hou, L., Zheng, C., Liu, L., and Zhang, H.# (2025). An efficient testing procedure for high-dimensional mediators with FDR control. Statistics in Biosciences, 17, 615–629. (The first author is a Master student under my supervision).
Getz, K., Jeon, M., Liu, L.,Liu, L., Zhang, H., Luo, C., Luo, J., and Toriola, A. (2025). Metabolites and lipid species mediate the associations of adiposity in childhood and early adulthood with mammographic breast density in premenopausal women. Breast Cancer Research, 27, 1-12.
Zhang, H., Hong, X., Zheng, Y., Hou, L., Zheng, C., Wang, X. and Liu, L. (2024). High-dimensional quantile mediation analysis with application to a birth cohort study of mother-newborn pairs. Bioinformatics, 40, 1-8.
Zhang, H., Zuo, L., Wang, H. and Sun, L. (2024). Approximating partial likelihood estimators via optimal subsampling. Journal of Computational and Graphical Statistics, 33, 276-288.
Wang, T., Zhang, H.# and Sun, L. (2024). Renewable learning for multiplicative regression model with streaming datasets. Computational Statistics, 39, 1559–1586. (The first author is a Master student under my supervision)
Shi, Y., Liu, L., Chen, J., Wylie, K., Wylie, T., Stout, M., Wang, C., Zhang, H., Shih, T., Xu, X., Zhang, A., Park, S., Jiang, H. and Liu, L. (2024). Simplified methods for variance estimation in microbiome abundance count data analysis. Frontiers in Genetics, 15, 1-22.
An, M. and Zhang, H.# (2023). High-dimensional mediation analysis for time-to-event outcomes with additive hazards model. Mathematics, 11, 1-11. (The first author is a Master student under my supervision)
Shi, Y., Li, H., Wang, C., Chen, J., Jiang, H., Shih, T., Zhang, H., Song, Y., Feng, Y. and Liu, L. (2023). A flexible quasi-likelihood model for microbiome abundance count data. Statistics in Medicine, 42, 4632-4643.
Hou, L., Zhang, H., Hou, Q. Guo, A., Wu, O., Zhang, J. and Yu, T. (2023). SARW: Similarity-Aware random walk for GCN. Intelligent Data Analysis, 27, 1615-1636.
Zhang, H. and Li, X. (2023). A framework for mediation analysis with massive data. Statistics and Computing, 33, 1-16.
Zhang, M., Zhang, Y., Zhang, W., Zhao, L., Jing, H., Wu, X., Guo, L., Zhang, H., Zhang, Y., Zhu, S., Zhang, S., Zhang, X. (2023). Postponing colonoscopy for 6 months in high‐risk population increases colorectal cancer detection in China. Cancer Medicine, 12, 11816-11827.
Perera, C.,Zhang, H., Zheng, Y., Hou, L., Qu, A., Zheng, C., Xie, K. and Liu, L. (2022). HIMA2: high-dimensional mediation analysis and its application in epigenome-wide DNA methylation data. BMC Bioinformatics, 23:296.
Zhang, H., Hou, L. and Liu, L. (2022) A review of high-dimensional mediation analyses in DNA methylation studies. In Guan, Weihua (Ed.), Epigenome-Wide Association Studies: Methods and Protocols, 2432, 123-135.
Wang, T. and Zhang, H.# (2022). Optimal subsampling for multiplicative regression with massive data. Statistica Neerlandica, 76, 418-449. (The first author is a Master student under my supervision).
Liu, J. and Zhang, H.# (2022). First-order random coefficient INAR process with dependent counting series. Communications in Statistics: Simulation and Computation, 51, 3341-3354. (The first author is a Master student under my supervision).
Zhang, H., Huang, J. and Sun, L. (2022). Projection-based and cross-validated estimation in high-dimensional Cox model. Scandinavian Journal of Statistics, 49, 353-372.
Li, C., Zhang, H.# and Wang, D. (2022). Modelling and monitoring of INAR(1) process with geometrically inflated Poisson innovations. Journal of Applied Statistics, 49, 1821-1847.
Zheng, Y., Joyce, B., Hwang, S., Ma, J., Liu, L., Allen, N., Krefman, A., Wang, J., Gao, T., Nannini, D., Zhang, H., Jacobs, D., et al. (2022). Association of cardiovascular health through young adulthood with genome-wide DNA methylation patterns in midlife: The Coronary Artery Risk Development in Young Adults (CARDIA) Study. Circulation, 146, 94-109.
Zhang, H. and Wang, H. (2021). Distributed subdata selection for big data via sampling-based approach. Computational Statistics and Data Analysis, 153, 1-19.
Zuo, L., Zhang, H.#, Wang, H. and Liu, L. (2021). Sampling-based estimation for massive survival data with additive hazards model. Statistics in Medicine, 40, 441-450. (The first author is a Master student under my supervision).
Zhang, H., Zheng, Y., Hou, L., Zheng, C. and Liu, L. (2021). Mediation analysis for survival data with high-dimensional mediators. Bioinformatics, 37, 3815-3821.
Zuo, L., Zhang, H.# , Wang, H. and Sun, L. (2021). Optimal subsample selection for massive logistic regression with distributed data. Computational Statistics, 36, pages2535–2562. (The first author is a Master student under my supervision).
Zhang, H., Chen, J., Feng, Y., Wang, C., Li, H. and Liu, L. (2021). Mediation effect selection in high-dimensional and compositional microbiome data. Statistics in Medicine, 40, 885-896.
Wang, Y. and Zhang, H.# (2021). Some estimation and forecasting procedures in Possion-Lindley INAR(1) process. Communications in Statistics: Simulation and Computation, 50,49-62. (The first author is a Master student under my supervision).
Zhang, H., Chen, J., Li, Z. and Liu, L. (2021). Testing for mediation effect with application to human microbiome data. Statistics in Biosciences, 13, 313-328.
Dang, Y., Wang, R., Qian, K., Lu, J., Zhang, H. and Zhang, Y. (2021). Clinical and radiological predictors of epidermal growth factor receptor mutation in nonsmall cell lung cancer. Journal of Applied Clinical Medical Physics, 22, 271-280.
Zhang, M., Zhao, L., Zhang, Y., Jing, H., Wei, L., Li, Z., Zhang, H., Zhang, Y., Zhu, S., Zhang, S. and Zhang, X. (2022). Colorectal cancer screening with high risk-factor questionnaire and fecal immunochemical tests among 5,947,986 asymptomatic population: a population-based study. Frontiers in Oncology, 12. 1-19.
Zhang, H., Huang, J. and Sun, L. (2020). A rank-based approach to estimating monotone individualized two treatment regimes. Computational Statistics and Data Analysis. 151, 1-12.
Wang, X., Wang, D. and Zhang, H. (2020). Poisson autoregressive process modeling via the penalized conditional maximum likelihood procedure. Statistical Papers, 61, 245-260.
Zhang, H.#, Wang, D. and Sun, L. (2017). Regularized estimation in GINAR(p) process. Journal of the Korean Statistical Society, 46, 502-517.
Zhang, H., Sun, L., Zhou, Y. and Huang, J. (2017). Oracle inequalities and selection consistency for weighted lasso in high-dimensional additive hazards model. Statistica Sinica, 27, 1903-1920.
Zhou, J., Zhang, H.#, Sun, L. and Sun, J. (2017). Joint analysis of panel count data with informative observation process and a dependent terminal event. Lifetime Data Analysis, 23, 560-584.
Zhang, H., Zheng, Y., Yoon, G., Zhang, Z., Gao, T., Joyce, B., Zhang, W., Schwartz, J., Vokonas, P., Colicino, E., Baccarelli, A., Hou, L. and Liu, L. (2017). Regularized estimation in sparse high-dimensional multivariate regression, with application to a DNA methylation study. Statistical Applications in Genetics and Molecular Biology, 16, 159-171.
Fang, S., Zhang, H.#, Sun, L. and Wang, D. (2017). Analysis of panel count data with time-dependent covariates and informative observation process. Acta Mathematicae Applicatae Sinica, English Series,33, 147-156.
Yoon, G., Zheng, Y., Zhang, Z., Zhang, H., Gao, T., Joyce, B., Zhang, W., Guan, W., Baccarelli, A., Jiang, W., Schwartz, J., Vokonas, P., Hou, L. and Liu, L. (2017). Ultra-high dimensional variable selection with application to normative aging study: DNA methylation and metabolic syndrome. BMC Bioinformatics, 18, 1-7.
Fang, S., Zhang, H. and Sun, L. (2016). Joint analysis of longitudinal data with additive mixed effect model for informative observation times. Journal of Statistical Planning and Inference,169, 43-55.
Zhang, H., Zheng, Y., Zhang, Z., Gao, T., Joyce, B., Yoon, G., Zhang, W., Schwartz, J., Just, A., Colicino, E., Vokonas, P., Zhao, L., Lv, J., Baccarelli, A., Hou, L. and Liu, L. (2016). Estimating and testing high-dimensional mediation effects in epigenetic studies. Bioinformatics, 32, 3150-3154.
Liu,Y., Wang, D., Zhang, H. and Shi, N. (2016). Bivariate zero truncated Poisson INAR(1) process. Journal of the Korean Statistical Society, 45, 260-275.
Zhang, H. and Wang, D. (2015). Inference for random coefficient INAR(1) process based on frequency domain analysis. Communications in Statistics: Simulation and Computation, 44, 1078-1100.
Li, C., Wang, D. and Zhang, H. (2015). First-order mixed integer-valued autoregressive processes with zero-inflated generalized power series innovations. Journal of the Korean Statistical Society, 44, 232-246.
Jia, B., Wang, D. and Zhang, H. (2014). A study for missing values in PINAR(1) processes. Communications in Statistics: Theory and Methods, 43, 4780-4789.
Zhang, H., Zhao, H., Sun, J., Wang, D. and Kim, K. (2013). Regression analysis of multivariate panel count data with an informative observation process. Journal of Multivariate Analysis, 119, 71-80.
Zhang, H., Sun, J. and Wang, D. (2013). Variable selection and estimation for multivariate panel count data via the seamless Lo penalty. The Canadian Journal of Statistics, 41, 368-385.
Zhang, H., Wang, D. and Zhu, F. (2012). Generalized RCINAR(1) process with signed thinning operator. Communications in Statistics: Theory and Methods,41, 1750-1770.
Zhang, H., Wang, D. and Zhu, F. (2011). Empirical likelihood inference for random coefficient INAR(p) process. Journal of Time Series Analysis, 32, 195-203.
Zhang, H., Wang, D. and Zhu, F. (2011). The empirical likelihood for first-order random coefficient integer-valued autoregressive processes. Communications in Statistics: Theory and Methods, 40, 492-509.
Wang, D. and Zhang, H. (2011). Generalized RCINAR(p) process with signed thinning operator. Communications in Statistics: Simulation and Computation, 40, 13-44.
Zhang, H., Wang, D. and Zhu, F. (2010). Inference for INAR(p) processes with signed generalized power series thinning operator. Journal of Statistical Planning and Inference,140, 667-683.

Haixiang Zhang

Publications (# indicates corresponding author)