Home About Us News [JSM 2020] Competitions Join Us

Astrostatistics sessions at JSM 2020

INDEX | Aug 3: AIG meeting | Aug 3:11 Solar & Geo | Aug 4:253 Large Public Data | Aug 4:265 Astro & Space Physics | Aug 5:401 Student Paper Award | Aug 6:431 Astronomical(ly) Big Data | Aug 6:539 Signal detection | Other events |

We will use an Astrostatistics Interest Group Slack channel to ease communication between the audience and the speakers at the various sessions. In order to be added to this channel, please contact one of the office bearers or write to aigamstat @ gmail.

AIG Annual Meeting

Mon Aug 3, 2020, 1:00pm-2:30pm EDT

The business meeting of the AIG was held on the afternoon of Aug 3 via Zoom.

Summary .pdf

Agenda

Overview and Census
Charter
- Amendments on language, Program Chair term, and Webmaster
- Elections for next year
- Section vs Interest Group
Activities
- Sessions during this JSM
- Student Paper Award
- AIG Virtual Table
- AIG Mixer on Wed Aug 5 5-6pm via Zoom; see Slack Channel for connection information
- Planning for next year
- Other events
External Coordination
Web presence
- Website, contact email, and email exploder
- Social media: Slack, twitter, etc
- Logo

| Top |

Session 11

Mon Aug 3, 2020, 10:00am - 11:50am EDT

219357 Statistical Inference for Solar and Geophysical Data - Invited Papers

Section on Physical and Engineering Sciences, Section on Nonparametric Statistics, Astrostatistics Special Interest Group
Organizer(s): Gwendolyn M Eadie, University of Toronto
Chair(s): David van Dyk, Imperial College London

10:05 AM Multitaper Analysis of High-Q Spectral Peaks and Non-stationarity in the Geomagnetic Field over the 400-4000 microHz Band
Alan Chave, Woods Hole Oceanographic Institution
Three 60 d sections of geomagnetic data from Honolulu Observatory during 2001-2 were analyzed using multitaper spectral analysis, showing the ubiquitous presence of narrowband, very statistically significant, high Q features in multitaper power spectra and pervasive non-stationarity as measured by the frequency offset coherence over 400-4000 microHz. The peak frequencies correlate well with the optically-measured frequencies of solar p-modes, and the raw Qs are defined by the resolution bandwidths of the estimates, with values ranging from 100s to 1000s. Further, spectral peaks are consistently coherent across frequency due to non-stationarity, and frequently exhibit cyclostationarity at offset frequencies of +-0.5 and +-1 cpd.
10:30 AM Hitting a Moving Target: Modelling Non-Stationary Relationships in Geomagnetism
David Riegert, Queen’s University; David J Thomson, Queen’s University
This talk focuses on modelling the relationship between Earth’s magnetic field as a predictor and induced currents in the ground as a response; a field of study known as magnetotellurics. Current modelling approaches assume a stationary relationship, however energy transfer between frequencies, indicating non-stationarity in the process, has been investigated in the univariate setting using seismic, ocean pressure (Chave et al., 2019), and geomagnetic measurements (Chave et al., 2018; Riegert & Thomson, 2018). Non-stationarity in a time-series process provides strong evidence for a non-stationary relationship between that series and any other. Current models are discussed and an extension is introduced which aims to account for violations in the assumption of stationarity.
10:55 AM Solar flare prediction with machine learning
Yang Chen, University of Michigan
We present our machine learning efforts, which show great promise towards early predictions of solar flare events. First, we present a data pre-processing pipeline that is built to extract useful data from multiple sources – Geostationary Operational Environmental Satellites (GOES) and Solar Dynamics Observatory (SDO)/Helioseismic and Magnetic Imager (HMI) and SDO/Atmospheric Imaging Assembly (AIA) – to prepare inputs for machine learning algorithms. Second, we adopt deep learning algorithms to extract/select features from raw HMI and AIA data. Third, we train machine learning models that capture both the spatial and temporal information from HMI magnetogram data for strong/weak flare classification and for predictions of flare intensities. Fourth, we show that using the ML-derived features gives almost as good performance as using active region parameters provided in HMI data files, i.e. features manually constructed based on physical principles. Last, case studies show a significant increase in the prediction score around 20 hours before strong solar flare events, which implies that early precursors appear at least 20 hours prior to the peak of a flare event.
11:20 AM Effect of Systematic Uncertainties on Density and Temperature Estimates in Coronae of Capella
Xixi Yu, Imperial College London; David van Dyk, Imperial College London; David Stenning, Imperial College London; Vinay Kashyap, Center for Astrophysics | Harvard & Smithsonian; Giulio Del Zanna, Centre for Mathematical Sciences, University of Cambridge
Information about the physical properties of astrophysical objects cannot be measured directly but is inferred by interpreting spectroscopic observations in the context of atomic physics calculations. A critical component of this analysis is understanding how uncertainties in the underlying atomic physics propagates to the uncertainties in the inferred plasma parameters. Instead of using the standard approach, a common strategy deployed by the astrophysicists, that treats the uncertainty as fixed and known and obtains the best-fit values of the parameters, we propose a multistage analysis to prevent underestimation of the error bars on the model parameters and increase the accuracy of the analysis results. A case study on Fe XVII and O VII/VIII is discussed where we implement both a pragmatic Bayesian method where atomic physics information is unaffected by observed data, and a fully Bayesian method where the data can be used to probe physics, and in particular detail a method of summarizing atomic uncertainties using principal components analysis.
11:45 AM Floor Discussion

| Top |

Session 253

Tue Aug 4, 2020, 1:00pm - 2:50pm EDT

219418 Innovations in AstroStatistics on Exploring Large Public Data – Invited Papers

Astrostatistics Special Interest Group, Section on Physical and Engineering Sciences, Section on Statistical Learning and Data Science
Organizer(s): Hyungsuk Tak, Pennsylvania State University
Chair(s): Hyungsuk Tak, Pennsylvania State University

1:05 PM Handling Model Uncertainty via Smoothed Inference [Slides]
Sara Algeri, University of Minnesota
Classical inferential methods often rely on the assumption that one among the models specified under the null or alternative hypothesis provides a suitable representation of the data under study. Unfortunately, when conducting searches for new physics, the specification of a correct model for the data is not always an easy task. Consequently, the validity and the sensitivity of the experiment under study may be substantially compromised. Algeri (2020) introduced a novel statistical approach to perform modeling, estimation, and inference under background mismodeling for large samples in the continuous setting. This work aims to extend the framework proposed in Algeri (2020) to arbitrary large samples from continuous or discrete distributions.
1:30 PM Improving Exoplanet Detection Power: Multivariate Gaussian Process Models for Stellar Activity
David Edward Jones, Texas A&M University; David Stenning, Imperial College London; Eric B Ford, Penn State University; Robert L Wolpert, Duke University; Thomas J Loredo, Cornell University; Xavier Dumusque, Observatoire Astronomique de l’Universite de Geneve
The radial velocity technique is one of the two main approaches for detecting planets outside our solar system, often referred to as exoplanets. When a planet orbits a star its gravitational force causes the star to move and this induces a Doppler shift (i.e. the star light appears redder or bluer than expected), and it is this effect that the radial velocity method attempts to detect. Unfortunately, these Doppler signals are typically contaminated by various stellar activity phenomena, such as dark spots on the star surface. We propose a Gaussian process modeling framework to capture this stellar activity and thereby improve detection power for low-mass planets (e.g., Earth-like planets). Our approach builds on previous work in two ways: (i) we use dimension reduction techniques to construct data-driven stellar activity proxies, as opposed to using traditional activity proxies; (ii) we extend the multivariate Gaussian process model of Rajpaul et al. (2015) to a class of models and use a large-scale model selection procedure to find the best model for the particular proxies at hand. Our method results in substantially improved power for planet detection.
1:55 PM Disentangling Stellar Activity and Planetary Signals using Bayesian High-dimensional Analysis [Slides]
Bo Ning, Yale University; Jessi Cisewski-Kehe, Yale University; Allen Davis, Yale University; Sarah Dodson-Robinson, University of Delaware; Debra Fischer, Yale University; Parker Holzer, Yale University; Alexander Wise, Penn State University
As the development of third-generation high precision spectrometers (e.g., the EXtreme PREcision Spectrometer, EXPRES), the stellar activity has become the dominant background noise that can lead to false discoveries or poor mass estimates of small planets. Recent efforts are putting on finding those stellar activity-sensitive lines from a given set of spectra. Since there are ~10^5 features in a typical spectrum, finding those lines can be challenging and time-consuming if using those proposed line-by-line search approaches. In this talk, a Bayesian variable selection method is introduced to automatically search for activity-sensitive lines through pixels from a set of spectra. We applied this method to study the spectra of alpha Centauri B from HARPS. The results are promising. We identified not only many well-known lines that are sensitive to activity, but also several new lines. With stellar activity being the largest source of variability for next-generation RV spectrographs, this work is a step toward accessing the myriad information available in high-precision spectra.
2:20 PM Floor Discussion

| Top |

Session 265

Tue Aug 4, 2020, 1:00pm - 2:50pm EDT

219617 Innovations in Statistics for Astronomy & Space Physics – Topic Contributed Papers

SSC (Statistical Society of Canada), Section on Physical and Engineering Sciences, Astrostatistics Special Interest Group
Organizer(s): Gwendolyn M Eadie, University of Toronto
Chair(s): David J Thomson, Queen’s University

1:05 PM The Photometric LSST Astronomical Time Series Classification Challenge (PLAsTiCC)
Renee Hlozek, University of Toronto
The Legacy Survey of Space and Time (LSST) on the Rubin Observatory will generate a data deluge: millions of astrophysical transients and variable sources will need to be classified from their time series light curves alone. Photometric classification has long been a problem of interest in the astronomical community, but the Photometric LSST Astronomical Time-series Classification Challenge (PLAsTiCC) brings a wide range of models together, simulated under LSST-like observing conditions for the first time. PLAsTiCC was delivered to the community through a Kaggle challenge, designed to stimulate interest in time-series photometric classification and deliver methodologies that will advance the LSST science case. We will give an overview of the road to PLAsTiCC, present and analyze the results of the PLAsTiCC challenge and metrics used to evaluate the challenge, and discuss lessons learned in presenting science-specific challenges to the broader computational and statistical communities.
1:25 PM Gibbs Point Process Model for Objects in the Star Formation Complexes of M33 Dayi Li, Western University; Pauline Barmby, Western University; Ian McLeod, Western University
We demonstrate the power of Gibbs point process models in the spatial statistics literature when applied to stellar population studies. We conduct a rigorous analysis of the empirical spatial distributions of objects in the star formation complexes of M33, including giant molecular clouds (GMCs), and young stellar cluster candidates (YSCCs). We choose a hierarchical model structure from GMCs to YSCCs based on the natural formation hierarchy between them. This approach circumvents the limitations of the empirical two-point correlation function analysis by naturally accounting for the inhomogeneity present in the distribution of YSCCs. We also investigate the effects of GMCs’ properties on their spatial distributions. We confirm that the distribution of GMCs and YSCCs are highly correlated. We found that the spatial distributions of YSCCs reaches a peak of clustering pattern at ~250 pc scale compared to a Poisson process and this clustering mainly occurs at regions where the galactocentric distance >~4.5 kpc. Furthermore, the galactocentric distance of GMCs and the mass of GMCs have a strong positive effect on the correlation strength between GMCs and YSCCs.
1:45 PM Likelihood-free Inference of Chemical Homogeneity in Open Clusters [Slides]
Aarya Patil; Jo Bovy, University of Toronto
Stellar clusters are excellent astrophysical laboratories to study because we expect stars in a cluster to have similar chemistry due to the standard assumption that they are born out of the same molecular cloud at the same time. Recent efforts to constrain initial abundance spread of different elements in open clusters of stars have employed Approximate Bayesian Computation to approximate the posterior probability distribution of the scatter in each element using stellar spectral data. Density-estimation likelihood-free inference methods turn inference into a density estimation task, and give orders-of-magnitude improvements over traditional ABC approaches. We illustrate accurate and efficient inference of elemental abundances on a set of synthetic spectra using an ensemble of Neural Density Estimators. We use compression to tackle the curse of dimensionality and to remove instrumental noise in the APOGEE spectral data. We believe that fast high-fidelity posterior inference will bring the power of differential abundances to stars in large spectroscopic surveys and help unravel the history of star formation and chemical enrichment in the Milky Way through chemical tagging.
2:05 PM Bayesian Inference and Computation for Old Star Clusters
Gwendolyn M Eadie, University of Toronto; Jeremy Webb, University of Toronto; Jeffrey Rosenthal, University of Toronto
Globular Clusters (GCs) are astronomical objects made up of tens of thousands to hundreds of thousands of stars. GCs are some of the oldest objects in the universe and are incredibly spatially dense, making them interesting laboratories for studying stellar populations. In particular, estimates of a GC’s mass as a function of radius can be used to test theories about GC evolution. However, the high spatial density of GCs is both a blessing and a curse — there is a large population of stars to observe in the outer regions of a GC, but it is impossible to discern individual stars in the inner regions because of extreme crowding. Thus, astronomers usually estimate a GC’s mass as a function of radius by first estimating the total light in radial bins, and then assuming a mass-to-light ratio. I will present a Bayesian approach that negates both the need for binning data and the assumption about the mass-to-light ratio, and that instead takes advantage of position and velocity information from a sample of individual stars. I will also discuss the statistical and computational challenges we face while including measurement uncertainties, projection effects, and incomplete data.
2:25 PM Statistical Characterization of Matrix Effects in Laser-Induced Breakdown Spectroscopy
David Stenning, Simon Fraser University
Scientists often model complex physics using computer simulations. Such simulations complicate statistical inference because the resulting likelihood function cannot be directly evaluated and a single simulation run may take minutes to hours on supercomputers. One example from astrophysics is in the area of stellar evolution, whereby computer simulators are used to predict the brightness of a star in several wide wavelength bands given a set of parameters that describe physical properties of the star (e.g., age, chemical composition, distance from Earth, etc.). Another example comes from simulating plasmas generated by Laser-Induced Breakdown Spectroscopy, a technique used by the ChemCam instrument on the Mars Science Laboratory rover Curiosity, to aid in determining the composition of rocks and soils on Mars. This talk will address the novel statistical challenges that arise when combining such simulations with observational or experimental data for inference, using examples from recent astrostatistical analyses.
2:45 PM Floor Discussion

| Top |

Session 401

Wed Aug 5, 2020, 1:00pm - 2:50pm EDT

219559 Astrostatistics Interest Group: Student Paper Award – Topic Contributed Papers

Information about Student Paper competition

Astrostatistics Special Interest Group
Organizer(s): Gwendolyn M Eadie, University of Toronto
Chair(s): Chad Schafer, Carnegie Mellon University

1:05 PM Photometric Biases in Modern Astronomical Surveys [Slides]
Joshua Speagle, Harvard University
Many modern astronomical surveys use maximum-likelihood (ML) methods to fit models when extracting photometry from images. We show these ML estimators systematically overestimate the flux as a function of the signal-to-noise ratio and the number of model parameters involved in the fit. This bias is substantially worse for resolved sources: while a 1% bias is expected for a 10-sigma point source, a 10-sigma resolved galaxy with a simplified Gaussian profile suffers a 2.5% bias. This bias also behaves differently depending how multiple bands are used in the fit: simultaneously fitting all bands leads the flux bias to become roughly evenly distributed between them, while fixing the position in “non-detection” bands (i.e. forced photometry) gives flux estimates in those bands that are biased low, compounding a bias in derived colors. We show that these effects are present in idealized simulations, outputs from the Hyper Suprime-Cam fake object pipeline (SynPipe), and observations from Sloan Digital Sky Survey Stripe 82, implying they are present in numerous astronomical datasets widely used today.
1:25 PM Trend Filtering: A Modern Statistical Tool for Time-Domain Astronomy and Astronomical Spectroscopy [Slides]
Collin Politsch, Carnegie Mellon University; Jessi Cisewski-Kehe, Yale University; Larry Wasserman, Carnegie Mellon University; Rupert Croft, Carnegie Mellon University
The problem of denoising a one-dimensional signal possessing varying degrees of smoothness is ubiquitous in time-domain astronomy and astronomical spectroscopy. In this work, we introduce trend filtering into the astronomical literature. Trend filtering is a modern nonparametric statistical tool that yields significant improvements in the broad problem space of denoising spatially heterogeneous signals. When the underlying signal is spatially heterogeneous, trend filtering is superior to any statistical estimator that is a linear combination of the observed data-including kernels, LOESS, smoothing splines, and Gaussian process regression. Furthermore, the trend filtering estimate can be computed with practical and scalable efficiency via a specialized convex optimization algorithm. In order to illustrate the broad utility of trend filtering, we discuss its relevance to a diverse set of spectroscopic and time-domain studies. The observations we discuss are (1) the Lyman-alpha forest of quasar spectra; (2) more general spectroscopy of quasars, galaxies, and stars; (3) stellar light curves with transiting exoplanet(s); (4) eclipsing binary light curves; and (5) supernova light curves.
1:45 PM Galaxy Cluster Mass Estimation Using Deep Learning [Slides]
Matthew Ho, McWilliams Center for Cosmology; Arya Farahi, Michigan Institute for Data Science; Michelle Ntampaka, Harvard-Smithsonian Center for Astrophysics; Markus Michael Rau, McWilliams Center for Cosmology; Hy Trac, McWilliams Center for Cosmology; Barnabas Poczos, School of Computer Science
Utilizing galaxy cluster abundance in precision cosmology requires large, well-defined cluster samples and robust mass measurement methods. In addition, modern cluster measurement techniques are expected to place a strong emphasis on efficiency and automation, as the wealth of detailed cluster data is expected to greatly increase with current and upcoming surveys such as DES, LSST, WFIRST, Euclid, and eROSITA. In this talk, I will discuss how we can leverage the use of deep learning models to infer dynamical cluster masses from spectroscopic samples with high precision and computational efficiency. I will demonstrate the ability of Convolutional Neural Networks (CNNs) to mitigate systematics in the virial scaling relation and produce dynamical mass estimates of galaxy clusters, using projected galaxies, with remarkably low bias and scatter. I will then discuss the performance of these methods relative to other leading analytic and machine learning dynamical mass estimators. Lastly, I will discuss our ongoing work in quantifying uncertainties in CNN mass predictions and our applications on spectroscopic datasets from the SDSS and GAMA surveys.
2:05 PM Inferring Galactic Parameters from Chemical Abundances: A Multi-Star Approach [Slides]
Oliver Philcox, Princeton University; Jan Rybizki, Max-Planck Institute for Astronomy
To understand galactic physics and create realistic simulations of the Milky Way, we require strong constraints on galactic evolution parameters, constraining effects such as the birth-rate of massive stars and the frequency of supernovae. In this talk, I will outline a method to precisely determine these using the chemical element abundances and ages from a large set of stars. Inference is performed via a simple chemical evolution model in a hierarchical Bayesian framework, marginalizing over a large number of parameters describing the stars’ individual environments and model errors to account for inaccuracies in our model. Hamiltonian Monte Carlo methods are used to sample the posterior function, which is sped up by use of Neural Networks. I will show the parameter constraints obtained from simulations (which are competitive with those from other methods), and discuss future applications of the method.
2:25 PM Multiband Probabilistic Cataloging: A Joint Fitting Approach to Point Source Detection and Deblending
Richard Feder, California Institute of Technology
Probabilistic cataloging (PCAT) outperforms traditional cataloging methods on single-band optical data in crowded fields (Portillo et al. 2017). We extend our work to multiple bands, achieving greater sensitivity (~0.4 mag) and greater speed (500x) compared to previous single-band results. We demonstrate the effectiveness of multiband PCAT on mock data, both in terms of recovering accurate posteriors in the catalog space, and in directly deblending sources. When applied to Sloan Digital Sky Survey (SDSS) observations of M2, taking Hubble Space Telescope data as truth, our joint fit on r and i band data goes ~0.4 mag deeper than single-band probabilistic cataloging and has a false discovery rate less than 20% for F606W < = 20. Compared to DAOPHOT, the two-band SDSS catalog fit goes nearly 1.5 magnitudes deeper using the same data, and maintains a lower false discovery rate down to F606W ~ 20.5. Given recent improvements in computational speed, multiband PCAT shows promise in application to large-scale surveys and is a plausible framework for joint analysis of multi-instrument observational data.
2:45 PM Floor Discussion

| Top |

Session 431

Thu Aug 6, 2020, 10:00am - 11:50am EDT

219396 Astronomical(ly) Big Data for Statisticians - Invited Papers

Section on Physical and Engineering Sciences, Astrostatistics Special Interest Group, Section on Statistical Consulting
Organizer(s): Vinay Kashyap, Center for Astrophysics | Harvard & Smithsonian
Chair(s): Gwendolyn M Eadie, University of Toronto

10:05 AM The Astrophysics Data Access Infrastructure [Slides]
Peter Kelsey George Williams, Center for Astrophysics | Harvard & Smithsonian
The field of astronomy has traditionally had a very open and robust infrastructure for data access, perhaps due to the fact that astronomical data generally have no economic value. From 20th-century collections of photographic plates to modern databases synthesizing the measurements reported in thousands of journal articles, astronomers have long recognized that sharing and standardization of data enable new science not anticipated by the original investigators. The value of this tradition is becoming even more pronounced as industry-driven data science methods spread and Web technologies enable ever more powerful forms of remote data access and exploration. However, the data rates of modern astronomical instruments – terabytes per night – are pushing the existing infrastructure and astronomers’ technical skills to the limit. I will provide an overview of the astronomical data access landscape and offer some predictions of how it may evolve in the future.
10:25 AM X-ray data and its many challenges [Slides]
Kristin Madsen, Caltech
Astrophysical data taken in X-rays and Gamma-rays are rich in content and the analysis challenges therefore of a wide assortment. The observatory fleet that obtains the data consist of several different instruments, each of which focus on different aspects, such as timing accuracy, high resolution spectroscopy, high spatial resolution, or high/low energy coverage, and it is not uncommon to combine data sets from several instruments. As such the importance of instrument calibration becomes crucial for data analysis, and this component often constitutes the largest source of uncertainties. Naturally, it has become a topic of lively discussion precisely how to correctly include these errors into complex data fitting routines, and in this talk I will review the challenges and discuss the implications of getting it wrong.
10:45 AM Gaia data: challenges for the exploitation of a large and complex dataset [Slides]
Xavier Luri, Universitat de Barcelona; Frederic Arenou, GEPI, Observatoire de Paris, Universite PSL, CNRS
In recent years it has become very common to hear statements on how Big Data, the availability of very large datasets, is revolutionising science. It is applicable to a wide variety of area, but it is often forgotten that the breakthroughs achieved with these data do not only come from its volume, but specially from the capability to do a meaningful data analysis with them. This capability requires the large processing capability of computers but also, and more critically, a proper understanding of the statistical properties of these samples and the ability to design statistical analysis tools to extract knowledge from the data. A clear example of this is the datasets produced by the Gaia mission of the European Space Agency. It is generating very large astrometric catalogues (two billion objects) with unprecedented accuracy, and in this talk I will discuss the challenges faced by the astronomical community to fully exploit its scientific potential. These challenges range from the basic need to understand the properties of the data (data censorships, variable transformation, random errors, systematics) to the design and implementation of analysis tools appropriate to handle them.
11:05 AM Solar (Data) Explosion: Challenges in Using Large Astrophysical Imaging Data Sets. [Slides]
Katharine Reeves, Harvard-Smithsonian Center for Astrophysics
The launch of the Solar Dynamic Observatory in 2010 pushed the field of Solar Physics solidly into the big data era by gathering several terabytes of imaging and magnetic field data of the Sun every day. The size of the data archive means that on-line visualization tools, metadata catalogs, and event databases have become increasingly important. In this talk, I will review some of these tools, as well as their challenges and limitations. Some of these challenges include: cleaning databases of false positives, calibration issues, and verifying completeness.
11:25 AM Discussant: Xiao-Li Meng, Harvard University
11:45 AM Floor Discussion

| Top |

Session 539

Thu Aug 6, 2020, 1:00pm - 2:50pm EDT

219552 Challenging signal detection problems in astronomy – Topic Contributed Papers

Section on Physical and Engineering Sciences, Astrostatistics Special Interest Group
Organizer(s): Eric Feigelson, Pennsylvania State University
Chair(s): Vinay Kashyap, Center for Astrophysics | Harvard & Smithsonian

1:05 PM Challenges for detecting gravitational wave signals [Slides]
Jess McIver, Univ of British Columbia
Ground-based gravitational-wave detector data is non-stationary and contains a high rate of transient noise artifacts. This transient noise can mimic or obscure true astrophysical gravitational-wave events, reducing the effective reach of searches for these signals. This talk will summarize the methods employed by the LIGO, Virgo, and KAGRA collaborations to characterize and mitigate the impact of transient noise, including regression, statistical correlation, and machine learning.
1:25 PM Experimental design and discovery of unknown unknowns with the Rubin Observatory Legacy Survey of Space and Time [Slides]
Federica Bianco, University of Delaware
Astrophysics has been at the forefront of data science and statistics for decades, yet The Rubin Observatory Legacy Survey of Space and Time (LSST) will usher a new era in data-intensive astrophysics. The next-generation ground-based astronomical survey, LSST will generate 20TB of information-rich imaging data every night for 10 years. A core deliverable of the survey is the exploration of the transient sky: astronomical sources that change brightness, color, and position, enabling a deep understanding of stellar physics and cosmology. Cutting edge methodologies that scale with the data volume to address event detection in stochastic time series, outliers and anomaly detection, and lightcurve characterization, typically in irregularly time spaced time series at the limit of the signal-to-noise are under development. However, to maximize the scientific throughput of the survey, statistics and data science methodologies have to enter the picture at the experimental design level. I will review applications of machine learning in experimental design to assure the Rubin LSST enables real-time detection of rare and rapidly evolving transients and the discovery of unknown unknowns.
1:45 PM Statistical Opportunities and Challenges of Multiepoch Photometric Surveys
Tamas Budavari, The Johns Hopkins University
Refraction by the atmosphere causes measured source directions to change depending on the airmass through which the observations are taken. This wavelength-dependent subtle shift called differential chromatic refraction provides new opportunities for modern ground-based astronomy surveys to obtain additional spectral information. Based on simulations of Large Synoptic Survey Telescope exposures, we expect this prism effect to be measurable from repeated observations of the same part of the sky over a range of different airmasses and parallactic angles. We will discuss initial successes and the challenges to infer high-resolution spectral and spatial information from the new-generation time-domain experiments.
2:05 PM A Multivariate Damped Random Walk Process for Irregularly-Spaced Multi-Filter Light Curves with Heteroscedastic Measurement Errors
Hyungsuk Tak, Pennsylvania State University; Zhirui Hu, Harvard University
In preparation for the era of the LSST-driven time-domain astronomy, we propose a state-space representation of a multivariate damped random walk process as a tool to analyze irregularly-spaced multi-filter light curves of an astronomical object with heteroscedastic measurement errors. It is not necessary that the multi-band observations be measured at the same time and multiple light curves be of the same length. Thus, the proposed process is suitable for the multi-band light curves of the LSST in particular. We adopt a computationally efficient Kalman-filtering approach to evaluate the likelihood function of the proposed model, leading to O(k^3n) complexity, where k is the number of bands and n is the total number of observations across the bands. This is a significant computational advantage over a commonly used O(k^3n^3) approach based on a univariate Gaussian process that stacks up all multi-filter light curves in one vector. Using this efficient likelihood evaluation, we provide both maximum likelihood estimates and Bayesian posterior samples of the model parameters. We apply the proposed process to several astronomical data sets for numerical illustrations.
2:25 PM Discussant: Eric Feigelson, Pennsylvania State University
2:45 PM Floor Discussion

| Top |

Other events of interest

Througout JSM AIG Virtual Community Table
Wed Aug 5, 5-6pm Astrostatistics Virtual Mixer (See Slack channel #jsm2020 for Zoom connection information)
Thu Aug 6, 2:20pm Larry Wasserman on Statistical Methods for some problems in Physics, in session Emerging Statistical Learning Methods in Modern Data Science