Title: | Event Categorization Matrix Classification for Nuclear Detonations |
---|---|
Description: | Implementation of an Event Categorization Matrix (ECM) detonation detection model and a Bayesian variant. Functions are provided for importing and exporting data, fitting models, and applying decision criteria for categorizing new events. This package implements methods described in the paper "Bayesian Event Categorization Matrix Approach for Nuclear Detonations" Koermer, Carmichael, and Williams (2024) available on arXiv at <doi:10.48550/arXiv.2409.18227>. |
Authors: | Scott Koermer [aut, cre, cph] |
Maintainer: | Scott Koermer <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.0.0 |
Built: | 2025-02-15 03:58:23 UTC |
Source: | https://github.com/lanl/ezecm |
Training a Bayesian ECM (B-ECM) model
BayesECM( Y, BT = c(100, 1000), priors = "default", verb = FALSE, transform = "logit" )
BayesECM( Y, BT = c(100, 1000), priors = "default", verb = FALSE, transform = "logit" )
Y |
|
BT |
integer vector of length 2, stipulating the number of |
priors |
list of parameters to be used in prior distributions. See details. |
verb |
logical. A setting of |
transform |
character string specifying the transform to use on the elements of |
The output of BayesECM()
provides a trained Bayesian Event Categorization Matrix (B-ECM) model, utilizing the data and prior parameter settings . If there are missing values in Y
, these values are imputed. A trained BayesECM
model is then used with the predict.BayesECM()
function to calculate expected category probabilities.
Before the data in Y
is used with the model, the p-values are transformed in an effort to better align the data with some properties of the normal distribution. When
transform == "logit"
the inverse of the logistic function maps the values to the real number line. Values of
Y
exactly equal to 0 or 1 cannot be used when transform == "logit"
. Setting the argument transform == "arcsin"
uses the transformation further described in Anderson et al. (2007). From here forward, the variable
should be understood to be the transformation of
Y
, where is the total number of rows in
Y
and is the number of discriminant columns in
Y
.
The B-ECM model structure can be found in a future publication, with some details from this publication are reproduced here.
B-ECM assumes that all data is generated using a mixture of normal distributions, where
is equal to the number of unique event categories. Each component of the mixture has a unique mean of
, and covariance of
, where
indexes the mixture component. The likelihood of the
event observation
of
discriminants can be written as the sum below.
Each Gaussian distribution in the sum is weighted by the scalar variable , where
so that the density integrates to 1.
There are prior distributions on each , and
, where
is the vector of mixture weights
. These prior distributions are detailed below. These parameters are important for understanding the model, however they are integrated out analytically to reduce computation time, resulting in a marginal likelihood
which is a mixture of matrix t-distributions.
is a matrix of the total data for the
event category containing
total event observations for training. The totality of the training data can be written as
, where
.
BayesECM()
can handle observations where some of the discriminants of an observation are missing. The properties of the conditional matrix t-distribution are used to impute the missing values, thereby accounting for the uncertainty related to the missing data.
The posterior distributions ,
, and
are dependent on the specifications of prior distributions
,
, and
.
is a multivariate normal distribution with a mean vector of
and is conditional on the covariance
.
is an Inverse Wishart distribution with degrees of freedom parameter
, or
nu
, and scale matrix , or
Psi
. is a Dirichlet distribution with the parameter vector
of length
.
The ability to use "default"
priors has been included for ease of use with various settings of the priors
function argument. The default prior hyperparameter values differ for the argument of transform
used, and the values can be inspected by examining the output of the BayesECM()
function. Simply setting priors = "default"
provides the same default values for all in the mixture. If all prior parameters are to be shared between all event categories, but some non-default values are desirable then supplying a list of a similar structure as
priors = list(eta = rep(0, times = ncol(Y) - 1), Psi = "default", nu = "default", alpha = 10)
can be used, where setting a list element "default"
can be exchanged for the correct data structure for the relevant data structure.
If one wishes to use some default values, but not share all parameter values between each event category, or wishes to specify each parameter value individually with no defaults, we suggest running and saving the output BayesECM(Y = Y, BT = c(1,2))$priors
. Note that when specifying eta
or Psi
it is necessary that the row and column order of the supplied values corresponds to the column order of Y
.
Returns an object of class("BayesECM")
. If there are missing data in the supplied argument Y
the object contains Markov-chain Monte-Carlo samples of the imputed missing data. Prior distribution parameters used are always included in the output. The primary use of an object returned from BayesECM()
is to later use this object to categorize unlabeled data with the predict.BayesECM()
function.
csv_use <- "good_training.csv" file_path <- system.file("extdata", csv_use, package = "ezECM") training_data <- import_pvals(file = file_path, header = TRUE, sep = ",", training = TRUE) trained_model <- BayesECM(Y = training_data)
csv_use <- "good_training.csv" file_path <- system.file("extdata", csv_use, package = "ezECM") training_data <- import_pvals(file = file_path, header = TRUE, sep = ",", training = TRUE) trained_model <- BayesECM(Y = training_data)
Outputs batch performance metrics of decisions using the output of predict.BayesECM()
when the true category of testing events are known. Can be used for empirically comparing different model fits.
becm_decision( bayes_pred = NULL, vic = NULL, cat_truth = NULL, alphatilde = 0.05, pn = TRUE, C = matrix(c(0, 1, 1, 0), ncol = 2), rej = NULL )
becm_decision( bayes_pred = NULL, vic = NULL, cat_truth = NULL, alphatilde = 0.05, pn = TRUE, C = matrix(c(0, 1, 1, 0), ncol = 2), rej = NULL )
bayes_pred |
An object of class |
vic |
Character string, indicating the "Very Important Category" ( |
cat_truth |
Vector of the same length as |
alphatilde |
Numeric scalar between 0 and 1 used for the significance level of typicality indices. |
pn |
Logical. Acronym for "false Positive, false Negative". |
C |
Square matrix of dimension 2 or the number of categories. Used as the loss function in the decision theoretic framework. The default is 0-1 loss for binary categorization. |
rej |
|
The matrix C
specifies the loss for a set of categorization actions for a single new observation given the probability of
belonging to each of
categories. Actions are specified as the columns, and the event category random variables are specified as the rows. See the vignette
vignette("syn-data-code", package = "ezECM")
for more mathematical details.
The dimension of matrix C
specifies the categorization type. A dimension of 2 is binary categorization, with the first column and row always corresponding to the category chosen as the vic
argument. Otherwise, when the dimension of C
is equal to the number of categories, the indices of the rows and columns of C
are in the same order as the categories listed for names(bayes_pred$BayesECMfit$Y)
.
A list of two data frames of logicals. The rows in each data frame correspond to the rows in bayes_pred$Ytilde
. The first data frame, named results
, has three columns named correct
, fn
, and fp
. The results
column indicates if the categorization is correct. fn
and fp
stand for false negatives and false positives respectively. fn
and fp
are found under binary categorization. Values of NA
are returned when a false positive or false negative is not relevant. The second data frame, named rej
, indicates the rejection of each new observation from each category via typicality index. One can use rej
to inspect if the typicality index played a role in categorization, or to supply to another call of becm_decision
.
csv_use <- "good_training.csv" file_path <- system.file("extdata", csv_use, package = "ezECM") training_data <- import_pvals(file = file_path, header = TRUE, sep = ",", training = TRUE) trained_model <- BayesECM(Y = training_data, BT = c(10,1000)) csv_use <- "good_newdata.csv" file_path <- system.file("extdata", csv_use, package = "ezECM") new_data <- import_pvals(file = file_path, header = TRUE, sep = ",", training = TRUE) bayespred <- predict(trained_model, Ytilde = new_data) accuracy <- becm_decision(bayes_pred = bayespred, vic = "explosion", cat_truth = new_data$event)
csv_use <- "good_training.csv" file_path <- system.file("extdata", csv_use, package = "ezECM") training_data <- import_pvals(file = file_path, header = TRUE, sep = ",", training = TRUE) trained_model <- BayesECM(Y = training_data, BT = c(10,1000)) csv_use <- "good_newdata.csv" file_path <- system.file("extdata", csv_use, package = "ezECM") new_data <- import_pvals(file = file_path, header = TRUE, sep = ",", training = TRUE) bayespred <- predict(trained_model, Ytilde = new_data) accuracy <- becm_decision(bayes_pred = bayespred, vic = "explosion", cat_truth = new_data$event)
Fits a regularized discriminant analysis model to labeled training data and generates an aggregate p-value for categorizing newly obtained data.
cECM(x, newdata = NULL, rda_params = NULL, transform = TRUE)
cECM(x, newdata = NULL, rda_params = NULL, transform = TRUE)
x |
Either a |
newdata |
a |
rda_params |
a |
transform |
Logical indicating if the supplied p-values should be transformed by the function |
Details on regularized discriminant analysis (RDA) can be found in Friedman (1989). Details on related implementation found in Anderson et al. (2007).
A list. Any returned objects contain a list element indicating the value of transform
supplied to the cECM
function call, as well as a klaR::rda()
object related to relevant training data. In addition if newdata
argument is supplied, the returned list contains a data.frame
specifying aggregate p-values for each new event (rows) for related event category (columns).
Anderson DN, Fagan DK, Tinker MA, Kraft GD, Hutchenson KD (2007).
“A mathematical statistics formulation of the teleseismic explosion identification problem with multiple discriminants.”
Bulletin of the Seismological Society of America, 97(5), 1730–1741.
Friedman JH (1989).
“Regularized discriminant analysis.”
Journal of the American statistical association, 84(405), 165–175.
x <- pval_gen(sims = 20, pwave.arrival = list(optim.starts = 5)) s <- sample(1:20, size = 2) newdata <- x[s,] newdata <- newdata[,-which(names(newdata) == "event")] x <- x[-s,] pval_cat <- cECM(x = x, transform = TRUE) pval_cat <- cECM(x = pval_cat, newdata = newdata)
x <- pval_gen(sims = 20, pwave.arrival = list(optim.starts = 5)) s <- sample(1:20, size = 2) newdata <- x[s,] newdata <- newdata[,-which(names(newdata) == "event")] x <- x[-s,] pval_cat <- cECM(x = x, transform = TRUE) pval_cat <- cECM(x = pval_cat, newdata = newdata)
Returns category decisions for uncategorized events. When the true category of such events are known performance metrics are returned.
cECM_decision(pval = NULL, alphatilde = NULL, vic = NULL, cat_truth = NULL)
cECM_decision(pval = NULL, alphatilde = NULL, vic = NULL, cat_truth = NULL)
pval |
Class |
alphatilde |
Scalar numeric between 0 and 1, used as the significance level for hypothesis testing. |
vic |
Character vector of length one, indicating the Very Important Category (VIC). Used for judging accuracy, false negatives, and false positives for binary categorization. Required when |
cat_truth |
Character vector corresponding to the true group of each row in the |
When is.null(cat_truth) == TRUE
, categorization over all event categories is used, using the same framework seen in (Anderson et al. 2007). The return value of "indeterminant"
happens when there is a failure to reject multiple event categories. "undefined"
is returned when all event categories are rejected.
When the arguments cat_truth
and vic
are included, binary categorization is utilized instead of categorization over all training categories. The definition of accuracy is more ambiguous when categorizing over all training categories and the user is encouraged to develop their own code for such a case. The goal of binary categorization is to estimate if an uncategorized observation is or is not in the event category stipulated by vic
. Uncategorized events which are "indeterminant"
or "undefined"
are deemed to not be in the vic
category.
When is.null(cat_truth) == TRUE
a vector providing the categorization over all event categories is returned. When the cat_truth
and vic
arguments are supplied a list is returned containing a data.frame
detailing if each event was categorized accurately in a binary categorization framework, was a false positive, was a false negative, and the estimated event category. A vector stating overall categorization accuracy, false positive rate, and false negative rate is included with the list.
Anderson DN, Fagan DK, Tinker MA, Kraft GD, Hutchenson KD (2007). “A mathematical statistics formulation of the teleseismic explosion identification problem with multiple discriminants.” Bulletin of the Seismological Society of America, 97(5), 1730–1741.
file_path <- system.file("extdata", "good_training.csv", package = "ezECM") training_data <- import_pvals(file = file_path, header = TRUE, sep = ",", training = TRUE) newdata <- training_data[1:10,] cat_truth <- newdata$event newdata$event <- NULL training_data <- training_data[-(1:10),] pval <- cECM(training_data, transform = TRUE, newdata = newdata) binary_decision <- cECM_decision(pval = pval, alphatilde = 0.05, vic = "explosion", cat_truth = cat_truth) decision <- cECM_decision(pval = pval, alphatilde = 0.05)
file_path <- system.file("extdata", "good_training.csv", package = "ezECM") training_data <- import_pvals(file = file_path, header = TRUE, sep = ",", training = TRUE) newdata <- training_data[1:10,] cat_truth <- newdata$event newdata$event <- NULL training_data <- training_data[-(1:10),] pval <- cECM(training_data, transform = TRUE, newdata = newdata) binary_decision <- cECM_decision(pval = pval, alphatilde = 0.05, vic = "explosion", cat_truth = cat_truth) decision <- cECM_decision(pval = pval, alphatilde = 0.05)
export_ecm()
saves an ECM model fit to training data for later use. The object can be the output of the cECM()
or BayesECM()
functions. Object saved in user specified location.
import_ecm()
loads a previously exported ECM model into the global environment.
export_ecm(x = NULL, file = stop("'file' must be specified"), verb = TRUE) import_ecm(file = NULL, verb = TRUE)
export_ecm(x = NULL, file = stop("'file' must be specified"), verb = TRUE) import_ecm(file = NULL, verb = TRUE)
x |
Name of ECM model to be exported. |
file |
Character string containing absolute or relative path of |
verb |
Logical indicating if a message indicating success should be printed to the console. |
Saves file or imports file into global environment.
x <- pval_gen(sims = 20, pwave.arrival = list(optim.starts = 5)) pval_cat <- cECM(x = x, transform = TRUE) export_ecm(x = pval_cat, file = "example-ecm.rda", verb = FALSE) reload_pval_cat <- import_ecm(file = "example-ecm.rda") ## The below code shouldn't be used, ## it deletes the file created during the example unlink("example-ecm.rda")
x <- pval_gen(sims = 20, pwave.arrival = list(optim.starts = 5)) pval_cat <- cECM(x = x, transform = TRUE) export_ecm(x = pval_cat, file = "example-ecm.rda", verb = FALSE) reload_pval_cat <- import_ecm(file = "example-ecm.rda") ## The below code shouldn't be used, ## it deletes the file created during the example unlink("example-ecm.rda")
Imports and organizes observed p-values located in a *.csv
file for training or categorization using an existing model.
import_pvals(file = NULL, header = TRUE, sep = ",", training = TRUE)
import_pvals(file = NULL, header = TRUE, sep = ",", training = TRUE)
file |
Character string providing name of |
header |
Logical indicating if first row of supplied |
sep |
Character string indicating the field separator character for the supplied |
training |
Logical indicating if the supplied |
The purpose of this function is to give the user a way to import p-value data in which the data will be organized for use with the cECM()
and BayesECM()
functions. Warnings are printed when potential issues may arise with the supplied file, and the function attempts to detect and correct simple formatting issues.
Ideally, the user supplies a *.csv
file which contains a header row labeling the columns. Each column contains the p-values of a particular discriminant, and each row must correspond to a single event. If training data is to be imported, the column containing known event categories is labeled "event"
. If new data is imported to be used with an existing model fit, the order of the columns in the new.data
file must be the same as the *.csv
file containing training data.
A base::data.frame()
of p-values with each row corresponding to a single event and each column corresponding to a particular discriminant. If data labels are correctly provided in the supplied *.csv
file, an additional column labeled event
will hold these values.
file_path <- system.file("extdata", "good_training.csv", package = "ezECM") training_data <- import_pvals(file = file_path, header = TRUE, sep = ",", training = TRUE)
file_path <- system.file("extdata", "good_training.csv", package = "ezECM") training_data <- import_pvals(file = file_path, header = TRUE, sep = ",", training = TRUE)
Similar utility to time_fn, however multiple seismometer locations can be provided simultaneously and normally distributed noise is added to the arrival time.
P_wave_gen( Si = NULL, S0 = NULL, Sig = NULL, neg.obs = TRUE, eps = sqrt(.Machine$double.eps) )
P_wave_gen( Si = NULL, S0 = NULL, Sig = NULL, neg.obs = TRUE, eps = sqrt(.Machine$double.eps) )
Si |
Numeric matrix providing seismometer locations. Must contain 3 columns corresponding to (X,Y) corrdinates and depth. |
S0 |
Numeric 3 element vector stipulating the location of an event, elements correspond to (X, Y, Z) |
Sig |
Numeric vector, or diagonal matrix, providing the variance in observed arrival times at each seismometer. |
neg.obs |
Logical indicating whether to allow negative observations of time (eg. the observed time of p-wave arrival is before the true time for the event). |
eps |
Numeric. If |
Numeric vector of observation times that correspond to the rows of Si
pwave.obs <- P_wave_gen(Si = c(100,200,3), S0 = c(400, 500, 4), Sig = 0.05)
pwave.obs <- P_wave_gen(Si = c(100,200,3), S0 = c(400, 500, 4), Sig = 0.05)
cECM()
categorizationPlot the data and output of cECM()
categorization
## S3 method for class 'cECM' plot(x, discriminants = c(1, 2), thenull = NULL, alphatilde = 0.05, ...)
## S3 method for class 'cECM' plot(x, discriminants = c(1, 2), thenull = NULL, alphatilde = 0.05, ...)
x |
an object of which is of class |
discriminants |
character or integer vector of length two. If a character vector is provided, the character strings must match a subset of the column names for the training data, ie. |
thenull |
character string or |
alphatilde |
numeric value specifying hypothesis testing significance level. Used in conjunction with |
... |
arguments passed to |
The plot generated from plot.ecm() is first dependent on if the provided x$newdata
contains a data frame of unlabled data.
If unlabled data is not part of the "cECM"
object, the labled data is simply plotted with the 0.68 and 0.95 confidence levels obtained from the distribution fits returned from cECM()
.
If unlabled data is part of the "cECM"
object, the unlabled data is plotted in addition to the distribution plots. Each unlabled data point appears on the plot as an integer, which indexes the corresponding row of x$newdata
.
Plot illustrating results of cECM()
x <- pval_gen(sims = 20, pwave.arrival = list(optim.starts = 5)) s <- sample(1:20, size = 2) newdata <- x[s,] newdata <- newdata[,-which(names(newdata) == "event")] x <- x[-s,] pval_cat <- cECM(x = x, transform = TRUE) pval_cat <- cECM(x = pval_cat, newdata = newdata) plot(x = pval_cat, thenull = "explosion")
x <- pval_gen(sims = 20, pwave.arrival = list(optim.starts = 5)) s <- sample(1:20, size = 2) newdata <- x[s,] newdata <- newdata[,-which(names(newdata) == "event")] x <- x[-s,] pval_cat <- cECM(x = x, transform = TRUE) pval_cat <- cECM(x = pval_cat, newdata = newdata) plot(x = pval_cat, thenull = "explosion")
New Event Categorization With Bayesian Inference
## S3 method for class 'BayesECM' predict(object, Ytilde, thinning = 1, mixture_weights = "training", ...)
## S3 method for class 'BayesECM' predict(object, Ytilde, thinning = 1, mixture_weights = "training", ...)
object |
an object of |
Ytilde |
|
thinning |
integer, scalar. Values greater than one can be provided to reduce computation time. See details. |
mixture_weights |
character string describing the weights of the distributions in the mixture to be used for prediction. The default, |
... |
not used |
The data in Ytilde
should be the p-values . The transformation applied to the data used to generate
object
is automatically applied to Ytilde
within the predict.BayesECM()
function.
For a given event with an unknown category, a Bayesian ECM model seeks to predict the expected value of the latent variable , where
is a vector of the length
, and
is the number of event categories. A single observation of
is a draw from a Categorical Distribution.
The expected probabilities stipulated within the categorical distribution of are conditioned on any imputed missing data, prior hyperparameters, and individually each row of
Ytilde
. The output from predict.BayesECM()
are draws from the distribution of , where
represents the observed values within the training data.
The argument mixture_weights
controls the value of , the probability of each
, before
is observed. The standard result is obtained from the prior hyperparameter values in
and the number of unique events in each
. Setting
mixture_weights = "training"
will utilize this standard result in prediction. If the frequency of the number events used for each category in training is thought to be problematic, providing the argument mixture_weights = "equal"
sets . If the user wants to use a set of
which are not equal but also not informed by the data, we suggest setting the elements of the hyperparameter vector
equal to values with a large magnitude and in the desired ratios for each category. However, this can cause undesirable results in prediction if the magnitude of some elements of
are orders larger than others.
To save computation time, the user can specify an integer value for thinning
greater than one. Every thinning
th Markov-chain Monte-Carlo sample is used for prediction. This lets the user take a large number of samples during the training step, allowing for better mixing. See details in a package vignette by running vignette("syn-data-code", package = "ezECM")
Returns a list
. The list element epz
is a matrix with nrow(Ytilde)
rows, corresponding to each event used for prediction, and K
named columns. Each column of epz
is the expected category probability of the row stipulated event. The remainder of the list elements hold data including Ytilde
, information about additonal variables passed to predict.BayesECM
, and data related to the previous BayesECM()
fit.
csv_use <- "good_training.csv" file_path <- system.file("extdata", csv_use, package = "ezECM") training_data <- import_pvals(file = file_path, header = TRUE, sep = ",", training = TRUE) trained_model <- BayesECM(Y = training_data, BT = c(10,1000)) csv_use <- "good_newdata.csv" file_path <- system.file("extdata", csv_use, package = "ezECM") new_data <- import_pvals(file = file_path, header = TRUE, sep = ",", training = TRUE) bayes_pred <- predict(trained_model, Ytilde = new_data)
csv_use <- "good_training.csv" file_path <- system.file("extdata", csv_use, package = "ezECM") training_data <- import_pvals(file = file_path, header = TRUE, sep = ",", training = TRUE) trained_model <- BayesECM(Y = training_data, BT = c(10,1000)) csv_use <- "good_newdata.csv" file_path <- system.file("extdata", csv_use, package = "ezECM") new_data <- import_pvals(file = file_path, header = TRUE, sep = ",", training = TRUE) bayes_pred <- predict(trained_model, Ytilde = new_data)
p-values are simulated for depth and first polarity discriminants to use in testing classification models.
pval_gen( sims = 100, grid.dim = c(800, 800, 30), seismometer = list(N = 100, max.depth = 2, Sig = NULL, sig.draws = c(15, 2)), explosion = list(max.depth = 5, prob = 0.4), pwave.arrival = list(H0 = 5, V = 5.9, optim.starts = 15), first.polarity = list(read.err = 0.95) )
pval_gen( sims = 100, grid.dim = c(800, 800, 30), seismometer = list(N = 100, max.depth = 2, Sig = NULL, sig.draws = c(15, 2)), explosion = list(max.depth = 5, prob = 0.4), pwave.arrival = list(H0 = 5, V = 5.9, optim.starts = 15), first.polarity = list(read.err = 0.95) )
sims |
numeric stipulating the number of individual events to generate. |
grid.dim |
numeric 3-vector providing the extent of the coordinate system in |
seismometer |
list stipulating variables relating to the seismometers. Providing an incomplete list reverts to the default values. List elements are:
|
explosion |
list stipulating variables regarding a detonation event. Providing an incomplete list reverts to the default values. List elements are:
|
pwave.arrival |
list stipulating variables regarding the depth from p-wave arrival discriminant. Providing an incomplete list reverts to the default values. List elements are:
|
first.polarity |
list stipulating variables regarding the depth from the polarity of first motion discriminant. List elements are:
|
Methods are adapted from the discriminants listed in (Anderson et al. 2007).
A data frame, with the number of rows equal to sims
. Columns contain the p-values observed for each simulation along with the true event type.
The equation below is used to model p-wave arrival time at seismometer
.
Where is the time of the event,
is a function modeling the arrival time (in this case time_fn),
is the location of seismometer
,
is the location of the event, and
is normally distributed error with known variance
. Given
N
seismometers, the MLE of the event time can be solved as the following:
Where is the matrix trace operation,
is a matrix containing the elements of each
on the diagonal,
is a diagonal matrix containing each
, and
is a diagonal matrix containing each
. This result is then plugged back into the first equation, which is then used in a normal likelihood function. Derivatives are taken of the likelihood so that a fast gradient based approch can be used to find the maximum likelihood estimate (MLE) of
.
The remainder of the calculation of the p-value is consistent with the Depth from P-Wave Arrival Times section of (Anderson et al. 2007). First note is equal to the 3-vector
. Given the null hypothesis for the depth of
, the MLE
given
is found. The sum of squared errors (SSE) is calculated as follows:
If is true then the following statistic has a central
distribution with
and
degrees of freedom.
Because the test has directionality, the statistic is then converted to a
statistic as such:
This statistic is then used to compute the p-value
Under the null hypothesis that the event is an explosion, (and therefore the true polarity of first motion is always one), the error rate for mistakenly identifying the polarity of first motion is stipulated as the argument first.polarity$read.err
. For an error rate of the p-value can then be calculated as follows:
Where is the number of stations where a positive first motion was observed, and
is the total number of stations.
Anderson DN, Fagan DK, Tinker MA, Kraft GD, Hutchenson KD (2007). “A mathematical statistics formulation of the teleseismic explosion identification problem with multiple discriminants.” Bulletin of the Seismological Society of America, 97(5), 1730–1741.
test.data <- pval_gen(sims = 5)
test.data <- pval_gen(sims = 5)
Tabulates results from the predict.BayesECM()
function for quick analysis.
## S3 method for class 'BayesECMpred' summary(object, index = 1, category = 1, C = 1 - diag(2), ...)
## S3 method for class 'BayesECMpred' summary(object, index = 1, category = 1, C = 1 - diag(2), ...)
object |
an object of |
index |
integer stipulating the event of interest. Value corresponds to the row index of |
category |
integer for the index of the category of interest for hypothesis testing. Alternatively, a character string naming the category of interest can be provided. |
C |
square matrix of dimension 2, providing loss values to be used in hypothesis testing. See Details. |
... |
not used |
summary.BayesECMpred()
prints expected loss for the binary hypothesis stipulated by category
. Expected loss is calculated using the loss matrix specified with argument C
. The default values for C
result in 0-1 loss being used. Format details for the loss matrix can be found in vignette("syn-data-code")
.
Typicality indices are used in cECM_decision()
as part of the decision criteria. Here, we have adapted typicality indices for use with a Bayesian ECM model for outlier detection, when a new observation may not be related to the categories used for training. Probability of the p-value being less than a significance level of 0.05 is reported. If no missing data is used for training, this probability is either 0 or 1.
Prints a summary including probability of each category for the event stipulated by index
, minimum expected loss for binary categorization, and probability of a-typicality of the event for the category specified by category
.
csv_use <- "good_training.csv" file_path <- system.file("extdata", csv_use, package = "ezECM") training_data <- import_pvals(file = file_path, header = TRUE, sep = ",", training = TRUE) trained_model <- BayesECM(Y = training_data, BT = c(10,1000)) csv_use <- "good_newdata.csv" file_path <- system.file("extdata", csv_use, package = "ezECM") new_data <- import_pvals(file = file_path, header = TRUE, sep = ",", training = TRUE) bayespred <- predict(trained_model, Ytilde = new_data)
csv_use <- "good_training.csv" file_path <- system.file("extdata", csv_use, package = "ezECM") training_data <- import_pvals(file = file_path, header = TRUE, sep = ",", training = TRUE) trained_model <- BayesECM(Y = training_data, BT = c(10,1000)) csv_use <- "good_newdata.csv" file_path <- system.file("extdata", csv_use, package = "ezECM") new_data <- import_pvals(file = file_path, header = TRUE, sep = ",", training = TRUE) bayespred <- predict(trained_model, Ytilde = new_data)
Simulates the arrival time of p-waves (without noise) using a mean velocity and euclidian distance, without taking the curvature into account.
time_fn(X = NULL, X0 = NULL, V = 7)
time_fn(X = NULL, X0 = NULL, V = 7)
X |
numeric three element vector stipulating the location of a seismometer. Units of km |
X0 |
numeric three element vector stipulating the location of an event. Units of km |
V |
numeric scalar supplying the p-wave velocity in km/s. A quick google search (Jan 31st 2023) shows typical values range from 5 to 8 km/s. |
Used for estimating event location, given a series of seismometer observations.
numeric scalar providing the time in seconds for a p-wave to arrive at a seismometer.
arrival.time <- time_fn(X = c(100,200,10), X0 = c(500, 80, 25), V = 5)
arrival.time <- time_fn(X = c(100,200,10), X0 = c(500, 80, 25), V = 5)