This textbook is designed for an undergraduate course in structural analysis and design as well as a follow up course on numerical (matrix-based) methods for structural analysis, i.e. introduction to finite element analysis. The book has ten chapters.
Our method is demonstrated to be effective for the prediction of conformational epitopes. Based on the study, we develop a tool to predict the conformational epitopes from 3D structures, available at -project-bpredictor/downloads/list.
Matrix Computer Analysis Of Structures Rubinstein Pdf Download
Evolutionary conservation: Generally speaking, functional regions on protein surfaces are usually more evolutionarily conserved than other regions, but the study on antigen crystal structures draws opposite conclusion. Statistical test reveals that evolutionary conservation can significantly distinguish epitopes from non-epitope region [42]. In order to calculate conservation scores, the primary sequence of the antigen chain we want to predict is aligned to the non-redundant protein database by using BLAST program (round of iteration is set to 3), and a position specific scoring matrix (PSSM) is returned. Then, the conservation score of the residue at the sequence position i is calculated by following function:
We further use the paired t-test to test differences between different methods, in which the predicted AUC scores of the test structures are used. Since the statistical analysis usually requires a great number of samples, the limited number of structures in the study leads to no statistical significance (p-value > 0.05).
Mining the spatial context about Ag-Ab interaction and predicting B-cell conformational epitopes are essential for understanding the immune response and vaccine design. In this study, we make a systematic investigation into the basic knowledge about epitope recognition, and aim to improve the performance of the existing methods. We develop a novel method to predict conformational epitopes based on the 'thick surface patch' by combining conventional features and the 'adjacent residue distance' feature. The experiments show that our method yields the mean AUC value of 0.633 for the benchmark bound dataset, and the mean AUC value of 0.654 for the benchmark unbound dataset, when evaluated by LOOCV. In the independent test, the bound dataset-based model and unbound dataset-based model produce the mean AUC values of 0.589 and 0.598 for 19 independent test structures, respectively. Compared with the state-of-the-art methods, our methods show comparable or better performance on the independent test set. Our study also provides biological insights into the spatial context of residues as well as the roles of conventional features in antigen-antibody interactions. The standalone tool based on the study is available at -project-bpredictor/downloads/list.
Here, we demonstrate the structural composition and mechanical transformations of both the shaft and the tubule during distinct phases of nematocyst discharge in Nematostella, and further report the operating mechanism of the nematocyst thread sub-structures. Our analysis reveals the complex structure and the sophisticated biomechanical transformations underpinning the operational mechanism of nematocysts.
A major task in the analysis of high-dimensional single-cell data is to find low-dimensional representations of the data that capture the salient biological signals and render the data more interpretable and amenable to further analyses. As it happens, the matrix factorization and latent-space learning methods used for that task also provide a third route for imputation: they can reconstruct the observed data matrix from simplified representations of it.
The density corresponding to the a subunit (green), membrane-inserted portion of the b subunit (red), and A6L subunit (blue) are shown, in addition to a ribbon diagram for the a subunit (green) and the top 90 constraints from analysis of covarying residues in the a subunit sequence (red lines). 6% of the constraints could not be satisfied, which is consistent with the false positive rate from known structures (Marks et al., 2011). Scale bar, 25 Å.
Abstract:In electromagnetic models, the return-stroke channel is represented as an antenna excited at its base by either a voltage or a current source. To adjust the speed of the current pulse propagating in the channel to available optical observations, different representations for the return-stroke channel have been proposed in the literature using different techniques to artificially reduce the propagation speed of the current pulse to values consistent with observations. In this paper, we present an analysis of the available electromagnetic models in terms of their practical implementation. Criteria used for the analysis are the ease of implementation of the models, the numerical accuracy and the needed computer resources, as well as their ability to reproduce a desired value for the speed of the return stroke current pulse. Using the CST-MWS software, which is based on the time-domain finite-integration technique, different electromagnetic models were analyzed, namely (A) a wire embedded in a fictitious half-space dielectric medium (other than air), (B) a wire embedded in a fictitious coating with permittivity (εr) and permeability (μr), and (C) a wire in free-space loaded by distributed series inductance and resistance. It is shown that, by adjusting the parameters of each model, it is possible to reproduce a desired value for the speed of the current pulse. For each of the considered models, we determined the values for the adjustable parameters that allow obtaining the desired value of the return speed. Model A is the least expensive in terms of computing resources. However, it requires two simulation runs to obtain the electromagnetic fields. A variant of Model B that includes a fictitious dielectric/ferromagnetic coating is found to be more efficient to control the current speed along the channel than using only a dielectric coating. On the other hand, this model requires an increased number of mesh cells, resulting in higher memory and computational time. The presence of an inhomogeneous medium generates, in addition, unphysical fluctuations on the resulting current distributions. These fluctuations, which strongly depend on the size of the coating as well as on its electric and magnetic properties, can be attenuated by considering conductive losses in the coating. Considering the efficiency in terms of the required computer resources and ease of implementation, we recommend the use of Model C (wire loaded by distributed inductance and resistance).Keywords: lightning; return-stroke model; electromagnetic-model; CST-MWS; finite integration method; speed of the return-stroke current pulse
Junyang Qian, Yosuke Tanigawa, Wenfei Du, Matthew Aguirre, Chris Chang, Robert Tibshirani, Manuel A. Rivas, Trevor Hastie. A Fast and Scalable Framework for Large-scale andUltrahigh-dimensional Sparse Regression with Application to the UKBiobank. PLOS Genetics October 2020. We develop a scalable lassoalgorithm for fitting polygenic risk scores at GWAS scale. There is also a BiorXiv version. Our Rpackage snpnet combines efficient batch-wise strong-rulescreening with glmnet to fit lasso regularization paths onphenotypes in the UK Biobank data.Here is a link to the code andscripts used in the paper _snpnet_paper Lukasz Kidzinski and Trevor Hastie Longitudinal dataanalysis using matrix completion We use a regularized form of matrix completion to fit functionalprincipal component models, and extend these to other multivariate longitudinal regression models. We have an R package fcomplete which includes three vignettes demonstrating how it can be used. 2017
Rahul Mazumder, Jerome Friedman and Trevor Hastie:SparseNet:Coordinate Descent with Non-Convex Penalties. JASA 2011, 106(495) 1125-1138. Non-convex penaltiesproduce sparser models than the LASSO, but pose difficulties foroptimization. We propose a structured algorithm using coordinatedescent which finds good solutions with guaranteed convergenceproperties.Appendixfor SparseNet paper with extra figures and some additional technical proofs.sparsenet R package available from CRAN (Feb 2012) Rahul Mazumder, Trevor Hastie and Rob Tibshirani:Spectral Regularization Algorithms for Learning LargeIncomplete Matrices. We developan iterative algorithm for matrix completion using nuclear-norm regularization. JMLR 2010 11 2287-2322 MATLAB package SoftImpute for matrix completion (zip archive). R package to appear soon.Daniela Witten, Rob Tibshirani and Trevor Hastie: A penalized matrix decomposition, with applications to sparse canonical correlation analysis and principal components Biostatistics 10(3)
Trevor Hastie, Robert Tibshirani and Jerome Friedman, Elements of Statistical Learning: Data Mining, Inference and Prediction (Second Edition). February, 2009. 745 pages in full color. Springer-Verlag, New York.This second edition adds 4 new chapters: Random Forests, Ensemble Learning, Undirected Graphical Models, and High Dimensional Problems: p>>N. For more details see ESL book homepage. In an agreement with Springer, we are able to offer for free the ESL book pdf (8.2M).
2008
2000 Trevor Hastie, Robert Tibshirani, David Botstein and Pat Brown,"Supervised Harvesting of Expression Trees" (postscript) .Starting from a hierarchically clustered expression array, we build apredictive model for an outcome variable using cluster nodes as inputs. (pdf version)Tech. report. August 2000.
Olga Troyanskaya, Michael Cantor, Gavin Sherlock,Pat Brown, Trevor Hastie, Robert Tibshirani, David Botsteinand Russ B. Altman,Missing value estimation methods for DNAmicroarrays BIOINFORMATICSVol. 17 no. 6, 2001Pages 520-525
Eva Cantoni and Trevor Hastie "Degrees-of-Freedom Tests for SmoothingSplines." Tech Report, May 2000.Published inBiometrika 2002,89, 251-263.A mixed-effects framework for smoothing splines and additive modelsallows for exact tests between nested models of different complexity.The complexity is calibrated via the effective degrees of freedom.
Thomas Yee and Trevor Hastie.Reduced Rank Vector Generalized LinearModels (2003) Statistical Modeling, 3, pages 15-41. Using the multinomialas a primary example, we propose reduced rank logit models fordiscrimination and classification. This is a conditional versionof the reduced rank model of linear discriminant analysis.
Robert Tibshirani, Guenther Walther and Trevor Hastie. "Estimating the number of clusters in a dataset via the Gap statistic". Journal of the Royal Statistical Society, B, 63:411-423,2001.
StochasticModeling and Tracking of Human Motion, a joint project withDirk Ormoneit and Michael Black's group at Xerox Parc, withmotion graphics demonstrations of learned walkingcharacteristics.
Page 50 of "Generalized Additive Models" by Hastie and Tibshirani,1990, Chapman and Hall. Some copies of the 1999 printing by CRC Pressreplaced page 50 with a page from a history text! page50.ps or page50.pdf
Trevor Hastie, Laura Bachrach, Balasubramanian Narasimhan and May ChooWang. Flexible Statistical Models for Growth Fragments: a Study ofBone Mineral Acquisition Compare your own measurements using our onlinegrowth tables
Gareth James and Trevor Hastie Functional Linear DiscriminantAnalysis for Irregularly Sampled Curves (2001) Journal of the RoyalStatistical Society, Series B JRSS B 63, 533-550.
Trevor Hastie, Robert Tibshirani, Michael B Eisen, AshAlizadeh, Ronald Levy, Louis Staudt, Wing C Chan, David Botstein,Patrick Brown.`Gene shaving' as a method for identifying distinct sets of geneswith similar expression patterns This is an online versionof the paper, published in theonline journal GenomeBiology.
Trevor Hastie, Robert Tibshirani, Michael Eisen, Pat Brown, Doug Ross, Uwe Scherf, John Weinstein, Ash Alizadeh, Louis Staudt, David Botstein"Gene Shaving: a New Class of Clustering Methods for ExpressionArrays". Postscript (2.9mb) orAdobe pdf (5.4mb) Tech. report. Jan 2000.James, G.,Hastie, T., and Sugar, C. "APrincipal Component Models for Sparse Functional Data".(2000, Biometrika, 87, 587-602) (pdf). When the data are collections of sampled curves orimages, functional principal components produce the principalmodes of variation. Here we generalize theseprocedures to deal with the case when each curve is sparsely andirregularly sampled.
1999 Hastie, T., Tibshirani, R., Sherlock, G., Eisen, M., Brown, P.and Botstein, D. "Imputing MissingData for Gene Expression Arrays". Technical report (1999),Stanford Statistics Department. pdf (145Kb) or postscript (450Kb) Tibshirani, R., Hastie, T. Eisen, M., Ross, D. , Botstein, D.and Brown, P. "Clustering methods for the analysis of DNA microarray data".Postscript (4.8mb) orCompressed Postscript (1.8mb)Tech. report Oct. 1999.D. Ormoneit and T. Hastie.Optimal kernel shapes for local linear regression.In S. A. Solla, T. K. Leen, and K-R. Müller, editors, Advancesin Neural Information Processing Systems 12. The MIT Press, 2000.Tibshirani, R. and Lazzeroni, L. andHastie, T. and Olshen, A. and Cox, D.R. "AGlobal Pairwise Approach to Radiation Hydrid Mapping".Technical Report January 1999. Using data of co-occurrence ofhybridized markers after shattering, inference is made of the markersequence in the chromosome. 1998 Friedman, J., Hastie, T. and Tibshirani, R. (Published version)Additive Logistic Regression:a Statistical View of Boosting Annals ofStatistics 28(2), 337-407. (with discussion)We show that boosting fits an additive logistic regression modelby stagewise optimization of a criterion very similar to thelog-likelihood, and present likelihood based alternatives. Wealso propose a multi-logit boosting procedure which appears to haveadvantages over other methods proposed so far.Here are the slides (2 per page) for myboosting talk.Crellin, N., Hastie, T. and Johnstone, I. "Statistical Models for ImageSequences" Technical report, submitted to "Human Brain Mapping".We study fMRI sequences of the human brain obtained fromexperiments involving repetitive neuronal activity. We investigate thefunction form ofthe hemodynamic response function, and provide evidence thatthe commonly adopted convolution model is inadequate.Hastie, T. and Tibshirani, R."Bayesian Backfitting" Stanford Technical report.The Gibbs sampler looks and feels like the backfitting algorithmfor fitting additive models. Indeed, a simple modification tobackfitting turns it into a Gibbs sampler for spitting outsamples from the "posterior" distribution for an additive fit. Published Statistical Science 15, no. 3 (2000), 196-223Wu, T.,Hastie, T., Schmidler, S. and Brutlag, D. "Regression Analysis of MultipleProtein Structures" Models for lining up and averaginggroups of protein structures. 1997 2ff7e9595c
Comments