Theodore M. Wong: Resume

Project details

Cognitive Computing is an emerging field with a goal to develop a coherent, unified, universal mechanism to engineer the mind. Cognitive computing seeks to implement a unified computational theory of the mind, taking advantage of the ability of the brain to integrate ambiguous sensory information, form spatiotemporal associations and abstract concepts, and make decisions and initiate sophisticated coordinated actions. Our approach to cognitive computing is to develop dedicated hardware systems for implementing a canonical cortical circuit that can achieve tremendous gains in power and space efficiency when compared to traditional von Neumann circuits. Such efficiency is crucial when scaling such circuits to the size of a mammalian cortex. Our cortical circuit is a reconfigurable network of spiking neurons that is composed of neuron processing elements connected through synapse memory elements—both akin to the basic building blocks of the brain. To validate and verify the configuration of our hardware, we have developed a simulator that can reproduce hardware functional behavior when testing circuits at the size of a mammalian cortex. Such a simulator also doubles as a research tool for developing and testing new cognitive computing algorithms for implementation on the hardware.

Project details

Distributed, adaptive, hard real-time applications, such as process control or guidance systems, have requirements that go beyond those of traditional real-time systems: accommodation of a dynamic set of applications, autonomous adaptation as application requirements and system resources change, and security between applications from different organizations. Developers need a middleware with features that support developing and running these applications, especially as commercial and defense systems become more network-centric. The Virtual Mission Bus (VMB) middleware, targeted at both distributed IT systems and real-time systems, provides the essential basic services to support these applications and the tools for building more complex services, all while keeping the middleware kernel minimal enough for embedded system use. We successfully used the VMB to prototype a distributed spacecraft cluster system.

Project details

The growth in the amount of data being stored and manipulated for commercial, scientific, and intelligence applications is worsening the manageability and reliability of data storage systems. The expansion of such large-scale storage systems into petabyte capacities puts pressure on cost, leading to systems built out of many cheap but relatively unreliable commodity storage servers. These systems are expensive and difficult to manage—current figures show that management and operation costs are often several times purchase cost—partly because of the number of components to configure and monitor, and partly because system management actions often have unexpected, system-wide side effects. Also, these systems are vulnerable to attack because they have many entry points, and because there are no mechanisms to contain the effects either of attacks or of subsystem failures.

Kybos is a distributed storage system that addresses these issues. It will provide manageable, available, reliable, and secure storage for large data collections, including data that is distributed over multiple geographical sites. Kybos is self-managing, which reduces the cost of administration by eliminating complex management operations and simplifying the model by which administrators configure and monitor the system. Kybos stores data redundantly across multiple commodity storage servers, so that the failure of any one server does not compromise data. Finally, Kybos is built as a loosely coupled federation of servers, so that the compromise or failure of some servers will not impede remaining servers from continuing to take collective action toward system goals.

Our primary application is the self-management of federated (but potentially unreliable) clusters of storage servers, but we anticipate that the algorithms we have developed (and will implement) will have broad applicability to the general class of problems involving the coordination of independent autonomous agents with a collective set of mission goals.

PDF (1.4 MB)

Abstract

Modern machine learning pipelines can produce hundreds of data artifacts (such as features, models, and predictions) throughout their lifecycle. During that time, data scientists need to reproduce errors, update features, re-train on specific data, validate / inspect outputs, and share models and predictions. Doing so requires the ability to publish, discover, and version those artifacts.

This work introduces Disdat, a system to simplify ML pipelines by addressing these data management challenges. Disdat is built on two core data abstractions: bundles and contexts. A bundle is a versioned, typed, immutable collection of data. A context is a sharable set of bundles that can exist on local and cloud storage environments. Disdat provides a bundle management API that we use to extend an existing workflow system to produce and consume bundles. This bundle-based approach to data management has simplified both authoring and deployment of our ML pipelines.

PDF (2.8 MB)

Supporting information PDF (41 MB)

Abstract

Prediction of human physical traits and demographic information from genomic data challenges privacy and data deidentification in personalized medicine. To explore the current capabilities of phenotype-based genomic identification, we applied whole-genome sequencing, detailed phenotyping, and statistical modeling to predict biometric traits in a cohort of 1,061 participants of diverse ancestry. Individually, for a large fraction of the traits, their predictive accuracy beyond ancestry and demographic information is limited. However, we have developed a maximum entropy algorithm that integrates multiple predictions to determine which genomic samples and phenotype measurements originate from the same person. Using this algorithm, we have reidentified an average of >8 of 10 held-out individuals in an ethnically mixed cohort and an average of 5 of either 10 African Americans or 10 Europeans. This work challenges current conceptions of personal privacy and may have far-reaching ethical and legal implications.

PDF (1.8 MB)

Abstract

The sequential programming paradigm of the von Neumann architecture is wholly unsuited for TrueNorth. Therefore, as our main contribution, we develop a new programming paradigm that permits construction of complex cognitive algorithms and applications while being efficient for TrueNorth and effective for programmer productivity. The programming paradigm consists of (a) an abstraction for a TrueNorth program, named Corelet, for representing a network of neurosynaptic cores that encapsulates all details except external inputs and outputs; (b) an object-oriented Corelet Language for creating, composing, and decomposing corelets; (c) a Corelet Library that acts as an ever-growing repository of reusable corelets from which programmers compose new corelets; and (d) an end-to-end Corelet Laboratory that is a programming environment which integrates with the TrueNorth architectural simulator, Compass, to support all aspects of the programming cycle from design, through development, debugging, and up to deployment. The new paradigm seamlessly scales from a handful of synapses and neurons to networks of neurosynaptic cores of progressively increasing size and complexity. The utility of the new programming paradigm is underscored by the fact that we have designed and implemented more than 100 algorithms as corelets for TrueNorth in a very short time span.

PDF (654 KB)

Abstract

Judiciously balancing the dual objectives of functional capability and implementation/operational cost, we develop a simple, digital, reconfigurable, versatile spiking neuron model that supports one-to-one equivalence between hardware and simulation and is implementable using only 1272 ASIC gates. Starting with the classic leaky integrate-and-fire neuron, we add: (a) configurable and reproducible stochasticity to the input, the state, and the output; (b) four leak modes that bias the internal state dynamics; (c) deterministic and stochastic thresholds; and (d) six reset modes for rich finite-state behavior. The model supports a wide variety of computational functions and neural codes. We capture 50+ neuron behaviors in a library for hierarchical composition of complex computations and behaviors. Although designed with cognitive algorithms and applications in mind, serendipitously, the neuron model can qualitatively replicate the 20 biologically-relevant behaviors of a dynamical neuron model.

PDF (1.4 MB)

Abstract

The non-von Neumann nature of the TrueNorth architecture necessitates a novel approach to efficient system design. To this end, we have developed a set of abstractions, algorithms, and applications that are natively efficient for TrueNorth. First, we developed repeatedly-used abstractions that span neural codes (such as binary, rate, population, and time-to-spike), long-range connectivity, and short-range connectivity. Second, we implemented ten algorithms that include convolution networks, spectral content estimators, liquid state machines, restricted Boltzmann machines, hidden Markov models, looming detection, temporal pattern matching, and various classifiers. Third, we demonstrate seven applications that include speaker recognition, music composer recognition, digit recognition, sequence prediction, collision avoidance, optical flow, and eye detection. Our results showcase the parallelism, versatility, rich connectivity, spatio-temporality, and multi-modality of the TrueNorth architecture as well as compositionality of the corelet programming paradigm and the flexibility of the underlying neuron model.

PDF (1.6 MB)

Abstract

Inspired by the function, power, and volume of the organic brain, we are developing TrueNorth, a novel modular, non-von Neumann, ultra-low power, compact architecture. TrueNorth consists of a scalable network of neurosynaptic cores, with each core containing neurons, dendrites, synapses, and axons. To set sail for TrueNorth, we developed Compass, a multi-threaded, massively parallel functional simulator and a parallel compiler that maps a network of long-distance pathways in the macaque monkey brain to TrueNorth. We demonstrate near-perfect weak scaling on a 16 rack IBM® Blue Gene®/Q (262144 CPUs, 256 TB memory), achieving an unprecedented scale of 256 million neurosynaptic cores containing 65 billion neurons and 16 trillion synapses running only 388× slower than real-time with an average spiking rate of 8.1 Hz. By using emerging PGAS communication primitives, we also demonstrate 2× better real-time performance over MPI primitives on a 4 rack Blue Gene/P (16384 CPUs, 16 TB memory).

PDF (992 KB)

Abstract

Our paper is the first in a series of publications to document the development of a fractionated space system. We explain the derivation of the technical approach for system development and present the preliminary architecture. We will publish future papers following the PDR, CDR, and flight demonstration.

PDF (256 KB)

Abstract

Large- and small-scale storage systems frequently serve a mixture of workloads, an increasing number of which require some form of performance guarantee. Providing guaranteed disk performance—the equivalent of a “virtual disk”—is challenging because disk requests are non-preemptible and their execution times are stateful, partially non-deterministic, and can vary by orders of magnitude. Guaranteeing throughput, the standard measure of disk performance, requires worst-case I/O time assumptions orders of magnitude greater than average I/O times, with correspondingly low performance and poor control of the resource allocation. We show that disk time utilization—analogous to CPU utilization in CPU scheduling and the only fully provisionable aspect of disk performance—yields greater control, more efficient use of disk resources, and better isolation between request streams than bandwidth or I/O rate when used as the basis for disk reservation and scheduling.

PDF (400 KB)

Abstract

Guaranteed I/O performance is needed for a variety of applications ranging from real-time data collection to desktop multimedia to large-scale scientific simulations. Reservations on throughput, the standard measure of disk performance, fail to effectively manage disk performance due to the orders of magnitude difference between best-, average-, and worst-case response times, allowing reservation of less than 0.01% of the achievable bandwidth. We show that by reserving disk resources in terms of utilization, it is possible to create a disk scheduler that supports reservation of nearly 100% of the disk resources, provides arbitrarily hard or soft guarantees depending upon application needs, and yields efficiency as good or better than best-effort disk schedulers tuned for performance. We present the architecture of our scheduler, prove the correctness of its algorithms, and provide results demonstrating its effectiveness.

PDF (130 KB)

Abstract

Many applications— for example, scientific simulation, real-time data acquisition, and distributed reservation systems— have I/O performance requirements, yet most large, distributed storage systems lack the ability to guarantee I/O performance. We are working on end-to-end performance management in scalable, distributed storage systems. The kinds of storage systems we are targeting include large high-performance computing (HPC) clusters, which require both large data volumes and high I/O rates, as well as large-scale general-purpose storage systems.

PDF (136 KB)

Abstract

Much of the practical work in the autonomic management of storage systems has taken the “bolt-on” approach: take existing systems and add a separate management system on the side. While this approach can improve legacy systems, it has several problems, including scaling to heterogeneous and large systems and maintaining consistency between the system and the management model. We argue for a different approach, where autonomic management is woven throughout a system, as in the K2 distributed storage system that we are implementing. This distributes responsibility for management operations over all nodes according to ability and security, and stores management state as part of the entities being managed. Decision algorithms set general configuration goals and then let many system components work in parallel to move toward the goals.

PDF (424 KB)

Abstract

Large-scale storage systems often hold data for multiple applications and users. A problem in such systems is isolating applications and users from each other to prevent their corresponding workloads from interacting in unexpected ways. Another is ensuring that each application receives an appropriate level of performance. As part of the solution to these problems, we have designed a hierarchical I/O scheduling algorithm to manage performance resources on an underlying storage device. Our algorithm uses a simple allocation abstraction: an application or user is associated with a corresponding pool of throughput, and manages throughput within its pool by opening sessions. The algorithm ensures that each pool and session receives at least a reserve rate of throughput and caps usage at a limit rate, using hierarchical token buckets and EDF I/O scheduling. Once it has fulfilled the reserves of all active sessions and pools, it shares unused throughput fairly among active sessions and pools such that they tend to receive the same amount. It thus combines deadline scheduling with proportional-style resource sharing in a novel way. We assume that the device performs its own low-level head scheduling, rather than modeling the device in detail. Our implementation shows the correctness of our algorithm, imposes little overhead on the system, and achieves throughput nearly equal to that of an unmanaged device.

PDF (682 KB)

Abstract

This paper provides an overview of the Intelligent Bricks project in progress at IBM Research. It describes common problems faced by data center operators and proposes a comprehensive solution based on brick architectures. Bricks are hardware building blocks. Because of certain properties, defined here, scalable and reliable systems can be built with collections of identical bricks. An important feature is that brick-based systems must survive the failure of any brick without requiring human intervention, as long as most bricks are operational. This simplifies system management and allows very dense and very scalable systems to be built. A prototype storage server in the form of a 3×3×3 array of bricks, capable of storing 26 TB, is operational at the IBM Almaden Research Center. It successfully demonstrates the concepts of the Intelligent Bricks architecture. The paper describes this implementation of brick architectures based on newly developed communication and cooling technologies, the software developed, and techniques for building very reliable systems from low-cost bricks, and it discusses the performance and the future of intelligent brick systems.

PDF (424 KB)

Abstract

We present a new verifiable secret redistribution protocol for threshold sharing schemes that forms a key component of a proposed archival storage system. Our protocol supports redistribution from (m,n) to (m',n') threshold sharing schemes without requiring reconstruction of the original data. The design is motivated by archive systems for which the added security of threshold sharing of data must be accompanied by the flexibility of dynamic shareholder changes. Our protocol enables the dynamic addition or removal of shareholders, and also guards against mobile adversaries. We observe that existing protocols either cannot be extended readily to allow redistribution between different access structures, or have vulnerabilities that allow faulty old shareholders to distribute invalid shares to new shareholders. Our primary contribution is that in our protocol, new shareholders can verify the validity of their shares after redistribution between different access structures.

PDF (264 KB)

Abstract

Modern high-end disk arrays often have several gigabytes of cache RAM. Unfortunately, most array caches use management policies which duplicate the same data blocks at both the client and array levels of the cache hierarchy: they are inclusive. Thus, the aggregate cache behaves as if it was only as big as the larger of the client and array caches, instead of as large as the sum of the two. Inclusiveness is wasteful: cache RAM is expensive.

We explore the benefits of a simple scheme to achieve exclusive caching, in which a data block is cached at either a client or the disk array, but not both. Exclusiveness helps to create the effect of a single, large unified cache. We introduce a DEMOTE operation to transfer data ejected from the client to the array, and explore its effectiveness with simulation studies. We quantify the benefits and overheads of demotions across both synthetic and real-life workloads. The results show that we can obtain useful (sometimes substantial) speedups.

During our investigation, we also developed some new cache-insertion algorithms that show promise for multi-client systems, and report on some of their properties.

PDF (4.6 MB)

Abstract

Aging is defined as the gradual functional and structural decline of an organism. It is considered a major risk factor for adverse health outcomes and mortality. While chronological time is a strong indicator of the aging process, it does not perfectly predict the underlying physiological state or biological age of an individual. Each individual with the same chronological age (CA) is at a different risk for developing aging-related diseases and mortality, primarily due to their varying lifestyles, genetic compositions, and biological aging processes. Over the past few decades, considerable efforts have been made to develop an informative measure of biological aging. Such a measurement should meet the following criteria as set out by the American Federation for Aging Research: (1) predict remaining life expectancy and disease-specific mortality better than CA, (2) monitor mechanisms underlying the aging process but not a specific disease, (3) be subject to repeated tests without harming the individual nor affecting life expectancy, and (4) be testable in both laboratory animals and humans. affecting life expectancy, and (4) be testable in both laboratory animals and humans.

Blood biomarkers (e.g., total cholesterol or platelet count) are often used to assess the overall health status of an individual in a clinical setting. Blood biomarker-based biological age (BBA) combines these blood biomarkers into a single value to facilitate interpretation of biological aging and better predict lifespan than CA. One of the leading BBA methods is Morgan Levine's PhenoAge (phenotypic age). Levine et al. used the National Health and Nutrition Examination Survey (NHANES) dataset to fit a model to estimate BBA using ten predictor variables: albumin, alkaline phosphatase, creatinine, C-reactive protein, glucose, lymphocyte percentage, mean corpuscular volume, red blood cell distribution width, white blood cell count, and CA. After adjusting for CA, the authors reported that a one-year increase in PhenoAge was associated with a 9% increase in mortality risk.

In 2023, 23andMe, Inc. started offering blood biomarker testing products, which opened up an opportunity to provide a measure of BBA to our customers. In this White Paper, we detail the modeling methodologies and validating procedures used to create the BBA model powering the 23andMe's Biological Age feature. We describe how we expanded on elements of PhenoAge to build 23andMe's Biological Age, a survival model trained on a large cohort from the UK Biobank (UKB). Since a larger set of blood biomarkers are available in 23andMe's blood biomarker panel and UKB than in NHANES, we developed our own model so that we could provide our own feature engineering and train the model on a larger cohort. The model was then validated on several ethnically-diverse subsets from UKB to show that the model performance is consistent across different ancestries. To further ensure our model's performance outside the somewhat limited CA range and ancestral background of the UKB participants, we used the NHANES as a validation set. We further computed the effect size of each biomarker per individual to provide a customized explanation of which biomarkers contribute to an increase or decrease of their BBA. We show that BBA is a better indicator of mortality risk and aging-related events than CA, particularly for cardiovascular and metabolic diseases. We also included a sex specific analysis to show that our result is consistent for both male and female participants (Supplementary Material). As an example of the dynamic nature of BBA, we include a longitudinal case study of an individual to demonstrate BBA's ability to reflect underlying changes to the physiological processes arising from environmental change over time.

PDF (699 KB)

Abstract

Polygenic scores (PGS) strive to estimate the heritable portion of risk for many common diseases and other traits. Genome-wide association studies (GWAS) frequently identify multiple genetic variants with small to moderate individual impact on risk for a condition; many of these variants are commonly single nucleotide polymorphisms (SNPs). To quantify the cumulative impact of these variants on risk, machine learning methods are used to construct statistical models that generate polygenic scores. Recently, advances in modeling methodology have enabled massive increases in the number of genetic variants that can be included in polygenic models, leading to corresponding increases in the proportion of trait variance that these models explain. As a result, PGS are now being used to estimate heritable risk for a wide range of conditions and research is ongoing to evaluate their potential utility as part of clinical decision making.

The key factor that limits researchers' ability to create large polygenic models is the size of the training cohort. Very large sample sizes are necessary both to identify genetic variants associated with a disease and to estimate their joint contribution to risk. Additionally, obtaining samples from diverse populations is necessary to create models that are calibrated to these populations, whether by assessing how well a model developed using data from a particular ancestry group (usually European) generalizes to other (often non-European) groups, or by developing models using data from various populations themselves. With over 14 million kits sold and approximately 80% of customers—including customers of many different ancestries—consenting to participate in research, 23andMe has a unique ability to develop large PGS that predict a wide range of health conditions and traits and to optimize and assess PGS performance across people with diverse ancestral backgrounds. Analyses of the company's genetic and self-reported health data show that we can replicate GWAS on clinically collected health information. Over the last several years, 23andMe has used PGS as the basis of dozens of customer reports on topics ranging from the ability to match a musical pitch to the likelihood of developing type 2 diabetes.

Here we detail the modeling methodologies and evaluation procedures used to create the PGS behind new 23andMe Health Predisposition and Wellness reports on common health conditions. The Appendices to this White Paper further summarize the performance and characteristics of each PGS used in recently released reports (starting in 2024). We intend for this White Paper and the Appendices to be living documents that will be updated as methodologies change and new PGS-based genetic reports are released. A change log is provided at the bottom of this document to describe significant updates.

PDF (221 KB)

Abstract

Since the final submission of our work on the Compass scalable simulator for the IBM TrueNorth Cognitive Computing architecture, we have simulated an unprecedented 2.084 billion neurosynaptic cores containing 53×10¹⁰ neurons and 1.37×10¹⁴ synapses running at only 1542× slower than real time. We attained this scale by using the Sequoia 96-rack IBM® Blue Gene®/Q supercomputer at Lawrence Livermore National Labs. By comparison, the ultimate vision of the DARPA SyNAPSE program is to build a cognitive computing architecture with 10¹⁰ neurons and 10¹⁴ synapses, inspired by the following: Shepherd estimates the number of synapses in the human brain as 0.6×10¹⁴, and Koch estimates the number of synapses in the human brain as 2.4×10¹⁴.

It is important to clarify that we have not built a biologically realistic simulation of the complete human brain. Rather, we have simulated a novel modular, scalable, non-von Neumann, ultra-low power, cognitive computing architecture at the DARPA SyNAPSE metric of 10¹⁴ synapses that itself is inspired by the number of synapses in the human brain. We mathematically abstract away computation (“neurons”), memory (“synapses”), and communication (“axons”, “dendrites”) from biological detail towards engineering goals of maximizing function (utility, applications) and minimizing hardware cost (power, area, delay) and design complexity.

PDF (257 KB)

Abstract

Distributed, adaptive, hard real-time applications, such as process control or guidance systems, have requirements that go beyond those of traditional real-time systems: accommodation of a dynamic set of applications, autonomous adaptation as application requirements and system resources change, and security between applications from different organization. Developers need a middleware with features that support developing and running these applications, especially as commercial and defense systems become more "network-centric". The Virtual Mission Bus (VMB) middleware, targeted at both distributed IT systems and real-time systems, provides the essential basic services to support these applications and the tools for building more complex services, all while keeping the middleware kernel minimal enough for embedded system use. We successfully used the VMB to prototype a distributed spacecraft cluster system.

PDF (197 KB)

Abstract

Current tools for storage system configuration management make offline decisions, recovering from, instead of preventing, performance specification violations. The consequences are severe in a large-scale system that requires complex actions to recover from failures, and can result in a temporary shutdown of the system. We introduce Kybos, a distributed storage system that makes online, autonomous responses to system changes. It runs on clusters of intelligent bricks, which provide local enforcement of global performance and reliability specifications and so isolate both recovery and application IO traffic. A management agent within Kybos translates simple, high-level specifications into brick-level enforcement targets, invoking centralized algorithms only when taking actions that require global state. Our initial implementation shows that this approach works well.

PDF (714 KB) PostScript (1642 KB)

Abstract

Modern society has produced a wealth of data to preserve for the long term. Some data we keep for cultural benefit, in order to make it available to future generations, while other data we keep because of legal imperatives. One way to preserve such data is to store it using survivable storage systems. Survivable storage is distinct from reliable storage in that it tolerates confidentiality failures in which unauthorized users compromise component storage servers, as well as crash failures of servers. Thus, a survivable storage system can guarantee both the availability and the confidentiality of stored data.

Research into survivable storage systems investigates the use of m-of-n threshold sharing schemes to distribute data to servers, in which each server receives a share of the data. Any m shares can be used to reconstruct the data, but any m - 1 shares reveal no information about the data. The central thesis of this dissertation is that to truly preserve data for the long term, a system that uses threshold schemes must incorporate recovery protocols able to overcome server failures, adapt to changing availability or confidentiality requirements, and operate in a decentralized manner.

To support the thesis, I present the design and experimental performance analysis of a verifiable secret redistribution protocol for threshold sharing schemes. The protocol redistributes shares of data from old to new, possibly disjoint, sets of servers, such that new shares generated by redistribution cannot be combined with old shares to reconstruct the original data. The protocol is decentralized, and does not require intermediate reconstruction of the data; thus, it does not introduce a central point of failure or risk the exposure of the data during execution. The protocol incorporates a verification capability that enables new servers to confirm that their shares can be used to reconstruct the original data.

Details

According to one embodiment of the present invention, a method for protecting authenticated communication in dynamic federated environments is provided. The method includes distributing shares of a private signature key to a group of users. When switching from an existing to a new group of users, the method includes producing a plurality of sub-shares from each of the distributed shares of existing users, with each sub-share being accompanied by a corresponding validity proof. The sub-shares from multiple existing users are combined to generate a set of shares for new users, with each new share being derived from sub-shares from multiple existing users.

Details

Embodiments of the invention relate to a function-level simulator for modeling a neurosynaptic chip. One embodiment comprises simulating a neural network using an object-oriented framework including a plurality of object-oriented classes. Each class corresponds to a component of a neural network. Running a simulation model of the neural network includes instantiating multiple simulation objects from the classes. Each simulation object is an instance of one of the classes.

Details

A method for implementation within a scheduler for a processor is adescribed. The method includes receiving a plurality of jobs from an earliest deadline first (EDF) schedule, wherein the scheduler implements an EDF scheduling model. The method also includes receiving a separate job from a source other than the EDF schedule. The separate job has a fixed scheduling requirement with a specific execution time. The method also includes determining an amount of available utilization capacity of the processor and inserting the separate job into an execution plan of the processor with the plurality of jobs from the EDF schedule in response to a determination that the available utilization capacity of the processor is sufficient to execute the separate job according to the fixed scheduling requirement associated with the separate job.

Details

A scheduler selects an I/O from a session of a pool and updates token buckets associated with resource limits and reserves for the session and the pool and statistics used in determining fair sharing. To select an I/O, the scheduler identifies sessions with a non-empty queue, identifies head I/Os in the queues, computes for the head I/O a deadline using session and pool reserve buckets and a release time using session and pool limit buckets, and selects a head I/O with an earliest deadline that is past the release time. If the deadline of the selected candidate head I/O is in the past, the scheduler transfers the selected head I/O to the tail of the storage device queue. Otherwise, the scheduler selects the pool with the least amount of I/O traffic according to a session fair share estimator.

Details

A resource management system uses a virtual resource pool distributed across a set of storage devices to manage resources in a distributed storage system. The storage devices dedicate a resource in an allocation pool available to the virtual resource pool. The virtual resource pool is a virtual storage server in which an application receives at least a predetermined amount of storage capacity, a predetermined level of performance, or a predetermined reliability, represented by tokens. A virtual resource pool manager gives the tokens to an application. The application transmits the tokens along with the commands to the storage device. The token represents a right to consume up to some limit of resources on a specific storage device in a computing session. The storage device locally tracks resource consumption through the token.

Details

A computer system with mechanisms for exclusive caching that avoids the accumulation of duplicate copies of information in host and storage system caches. A computer system according to these exclusive caching techniques includes a host system having a host cache and a storage system having a storage system cache and functionality for performing demote operations to coordinate the placement of information in the host cache to the storage system caches.

Details

A computer system cache monitors the effectiveness of data inserted into a cache by one or more sources to determine which source should receive preferential treatment when updating the cache. The cache may be part of a computer system that includes a plurality of host systems; each host system includes a host cache, connected to a storage system having a storage system cache. Ghost caches are used to record hits from the plurality of host systems performing operations for storing and retrieving data from the storage system cache. The storage system cache includes a cache controller that is operable to calculate a merit figure and determine an insertion point in a queue associated with the storage system cache based on the merit figure. The merit figure is calculated using a weighting algorithm for weighting hits from the plurality of sources recorded in the ghost caches.

Current position	Fantail MV, Mountain View, CA Owner/Software Systems Architect (January 2025 - Present) Founded Fantail MV to provide software systems architecture and engineering services, with a focus on AI/ML, data science, and distributed systems. Projects include: Managing DevOps for Xella Health, a company providing AI-powered insights revealing the root causes of women's health. Developing an LLM-based medical assistant to advise patients on treatment options for conception given their personal medical history. Developing and deploying a prototype health report service for estimating the onset of menopause from genetic data using machine learning models.
Technical skills	Languages: Go, Python, C, C++, Java, Rust Cloud environments: AWS+Terraform, including: API Gateway, Batch, CloudFormation, DynamoDB, EC2, IAM, Lambda, RDS, S3, Step Functions Development and build environments: Visual Studio Code + Copilot, Ant, Eclipse, Make, Maven
Previous positions	23andMe, Inc., Sunnyvale, CA Engineering Manager, Health Machine Learning (September 2022 - November 2024) Team/Technical Lead (July 2021 - September 2022) Technical Lead (April 2021 - July 2021) Senior Software Engineer (January 2018 - April 2021) Led the Health Machine Learning Engineering team, which designed and built pipelines for training machine learning models that underpin many of the direct-to-consumer reports in the 23andMe Health product. Responsibilities included managing project timelines and implementation quality, mentoring team members in their career development, collaborating with cross-functional teams, and implementing features. Other projects included: Developing a system to perform genome-wide association studies on 23andMe customer data. Modularizing the prediction of customer genetic traits within the 23andMe product infrastructure. Human Longevity, Inc., San Diego, CA / Mountain View, CA Software Engineer (September 2014 - November 2017) Worked on multiple projects that spanned basic scientific research, infrastructure development, and customer-facing product deployment. These included: Co-developing a software framework for data analysis pipelines to track the version dependencies between output data, code used to generate the output, and input data. Studying the relationship between genetics, voice, and demographic data as part of a research study to predict faces and other phenotypes from the genome. Illumina, Inc., San Diego, CA / San Francisco, CA Staff Software Engineer (July 2013 - September 2014) Re-engineered a human mutation mapping pipeline SaaS system to support the identification of the genetic variations that drive the onset and growth of cancers. IBM Research - Almaden, San Jose, CA Research Staff Member (December 2003 - July 2013) Led the development of the IBM Neuro Synaptic Core Simulator (NSCS) as part of the IBM SyNAPSE project. The NSCS modeled a reconfigurable cortical hardware circuit for emulating the cognitive abilities of a mammalian cortex. Also led the design and implementation of the IBM Virtual Mission Bus (VMB) middleware system for supporting distributed, adaptive, hard real-time applications in a dynamic cluster of orbital satellites. Projects included: The IBM Cognitive Computing project for DARPA SyNAPSE (October 2010 - July 2013) Project details Cognitive Computing is an emerging field with a goal to develop a coherent, unified, universal mechanism to engineer the mind. Cognitive computing seeks to implement a unified computational theory of the mind, taking advantage of the ability of the brain to integrate ambiguous sensory information, form spatiotemporal associations and abstract concepts, and make decisions and initiate sophisticated coordinated actions. Our approach to cognitive computing is to develop dedicated hardware systems for implementing a canonical cortical circuit that can achieve tremendous gains in power and space efficiency when compared to traditional von Neumann circuits. Such efficiency is crucial when scaling such circuits to the size of a mammalian cortex. Our cortical circuit is a reconfigurable network of spiking neurons that is composed of neuron processing elements connected through synapse memory elements—both akin to the basic building blocks of the brain. To validate and verify the configuration of our hardware, we have developed a simulator that can reproduce hardware functional behavior when testing circuits at the size of a mammalian cortex. Such a simulator also doubles as a research tool for developing and testing new cognitive computing algorithms for implementation on the hardware. The Virtual Mission Bus within the Pleiades cluster for DARPA System F6 (February 2008 - October 2010) Project details Distributed, adaptive, hard real-time applications, such as process control or guidance systems, have requirements that go beyond those of traditional real-time systems: accommodation of a dynamic set of applications, autonomous adaptation as application requirements and system resources change, and security between applications from different organizations. Developers need a middleware with features that support developing and running these applications, especially as commercial and defense systems become more network-centric. The Virtual Mission Bus (VMB) middleware, targeted at both distributed IT systems and real-time systems, provides the essential basic services to support these applications and the tools for building more complex services, all while keeping the middleware kernel minimal enough for embedded system use. We successfully used the VMB to prototype a distributed spacecraft cluster system. Kybos: Self-management for distributed brick-based storage (December 2003 - February 2008) Project details The growth in the amount of data being stored and manipulated for commercial, scientific, and intelligence applications is worsening the manageability and reliability of data storage systems. The expansion of such large-scale storage systems into petabyte capacities puts pressure on cost, leading to systems built out of many cheap but relatively unreliable commodity storage servers. These systems are expensive and difficult to manage—current figures show that management and operation costs are often several times purchase cost—partly because of the number of components to configure and monitor, and partly because system management actions often have unexpected, system-wide side effects. Also, these systems are vulnerable to attack because they have many entry points, and because there are no mechanisms to contain the effects either of attacks or of subsystem failures. Kybos is a distributed storage system that addresses these issues. It will provide manageable, available, reliable, and secure storage for large data collections, including data that is distributed over multiple geographical sites. Kybos is self-managing, which reduces the cost of administration by eliminating complex management operations and simplifying the model by which administrators configure and monitor the system. Kybos stores data redundantly across multiple commodity storage servers, so that the failure of any one server does not compromise data. Finally, Kybos is built as a loosely coupled federation of servers, so that the compromise or failure of some servers will not impede remaining servers from continuing to take collective action toward system goals. Our primary application is the self-management of federated (but potentially unreliable) clusters of storage servers, but we anticipate that the algorithms we have developed (and will implement) will have broad applicability to the general class of problems involving the coordination of independent autonomous agents with a collective set of mission goals. Baskin School of Engineering, UC Santa Cruz, Santa Cruz, CA Research Associate (April 2006 - April 2010) Collaborated with the Institute for Scalable Scientific Data Management on research in storage systems for high-end computing, with a focus on resource management and end-to-end I/O performance guarantees. Storage Systems Program, Hewlett-Packard Laboratories, Palo Alto, CA Summer Research Intern (June 1999 - August 1999) Conducted research into cooperative caching methods for high-end storage systems and demonstrated that a simple algorithm could yield useful (sometimes substantial) speedups. Information Technology Section, Mann Library, Cornell University, Ithaca, NY Senior Programmer/Analyst (January 1997 - August 1997) Designed, implemented, and deployed a storage and retrieval system for delivering digitally scanned monographs over the Internet as part of the Core Historical Literature of Agriculture project. Stratus Computer Corporation (now Stratus Technologies), Ithaca, NY Software Engineer, Message Distribution Service (MDS) (May 1996 - January 1997) Software Engineer, Isis Software Developer's Kit (SDK) (August 1995 - May 1996) Collaborated in the development of a more performant MDS release and provided maintenance engineering and quality assurance testing for the SDK. Cornell Information Technologies, Cornell University, Ithaca, NY Technical Consultant (October 1993 - April 1995) Provided front-line technical support for Windows- and MSDOS-based platforms. JPMorgan & Co., Incorporated, New York, NY Summer Intern, Global Technology and Operations (June 1994 - August 1994) Designed and developed software for computing credit exposure on interest-rate derivatives.
Educational background	Carnegie Mellon University (August 1997 - May 2004) Doctor of Philosophy in Computer Science Cornell University (January 1994 - August 1995) Master of Engineering in Computer Science Oxford University (October 1989 - June 1993) Bachelor of Arts in Engineering Science
Refereed publications	Ken Yocum, Sean Rowan, and Jonathan Lunt, and Theodore M. Wong. Disdat: Bundle data management for machine learning pipelines . In Proceedings of the 2019 USENIX Conference on Operational Machine Learning (OpML 2019), May 2019 PDF (1.4 MB) Abstract Modern machine learning pipelines can produce hundreds of data artifacts (such as features, models, and predictions) throughout their lifecycle. During that time, data scientists need to reproduce errors, update features, re-train on specific data, validate / inspect outputs, and share models and predictions. Doing so requires the ability to publish, discover, and version those artifacts. This work introduces Disdat, a system to simplify ML pipelines by addressing these data management challenges. Disdat is built on two core data abstractions: bundles and contexts. A bundle is a versioned, typed, immutable collection of data. A context is a sharable set of bundles that can exist on local and cloud storage environments. Disdat provides a bundle management API that we use to extend an existing workflow system to produce and consume bundles. This bundle-based approach to data management has simplified both authoring and deployment of our ML pipelines. Christoph Lippert et al. Identification of individuals by trait prediction using whole-genome sequencing data . In Proceedings of the National Academy of Sciences (PNAS), September 2017 PDF (2.8 MB) Supporting information PDF (41 MB) Abstract Prediction of human physical traits and demographic information from genomic data challenges privacy and data deidentification in personalized medicine. To explore the current capabilities of phenotype-based genomic identification, we applied whole-genome sequencing, detailed phenotyping, and statistical modeling to predict biometric traits in a cohort of 1,061 participants of diverse ancestry. Individually, for a large fraction of the traits, their predictive accuracy beyond ancestry and demographic information is limited. However, we have developed a maximum entropy algorithm that integrates multiple predictions to determine which genomic samples and phenotype measurements originate from the same person. Using this algorithm, we have reidentified an average of >8 of 10 held-out individuals in an ethnically mixed cohort and an average of 5 of either 10 African Americans or 10 Europeans. This work challenges current conceptions of personal privacy and may have far-reaching ethical and legal implications. Arnon Amir et al. Cognitive computing programming paradigm: A Corelet Language for composing networks of neurosynaptic cores In Proceedings of the 2013 International Joint Conference on Neural Networks (IJCNN 2013), August 2013 PDF (1.8 MB) Abstract The sequential programming paradigm of the von Neumann architecture is wholly unsuited for TrueNorth. Therefore, as our main contribution, we develop a new programming paradigm that permits construction of complex cognitive algorithms and applications while being efficient for TrueNorth and effective for programmer productivity. The programming paradigm consists of (a) an abstraction for a TrueNorth program, named Corelet, for representing a network of neurosynaptic cores that encapsulates all details except external inputs and outputs; (b) an object-oriented Corelet Language for creating, composing, and decomposing corelets; (c) a Corelet Library that acts as an ever-growing repository of reusable corelets from which programmers compose new corelets; and (d) an end-to-end Corelet Laboratory that is a programming environment which integrates with the TrueNorth architectural simulator, Compass, to support all aspects of the programming cycle from design, through development, debugging, and up to deployment. The new paradigm seamlessly scales from a handful of synapses and neurons to networks of neurosynaptic cores of progressively increasing size and complexity. The utility of the new programming paradigm is underscored by the fact that we have designed and implemented more than 100 algorithms as corelets for TrueNorth in a very short time span. Andrew S. Cassidy et al. Cognitive computing building block: A versatile and efficient digital neuron model for neurosynaptic cores. In Proceedings of the 2013 International Joint Conference on Neural Networks (IJCNN 2013), August 2013 PDF (654 KB) Abstract Judiciously balancing the dual objectives of functional capability and implementation/operational cost, we develop a simple, digital, reconfigurable, versatile spiking neuron model that supports one-to-one equivalence between hardware and simulation and is implementable using only 1272 ASIC gates. Starting with the classic leaky integrate-and-fire neuron, we add: (a) configurable and reproducible stochasticity to the input, the state, and the output; (b) four leak modes that bias the internal state dynamics; (c) deterministic and stochastic thresholds; and (d) six reset modes for rich finite-state behavior. The model supports a wide variety of computational functions and neural codes. We capture 50+ neuron behaviors in a library for hierarchical composition of complex computations and behaviors. Although designed with cognitive algorithms and applications in mind, serendipitously, the neuron model can qualitatively replicate the 20 biologically-relevant behaviors of a dynamical neuron model. Steve K. Esser et al. Cognitive computing systems: Algorithms and applications for networks of neurosynaptic cores . In Proceedings of the 2013 International Joint Conference on Neural Networks (IJCNN 2013), August 2013 PDF (1.4 MB) Abstract The non-von Neumann nature of the TrueNorth architecture necessitates a novel approach to efficient system design. To this end, we have developed a set of abstractions, algorithms, and applications that are natively efficient for TrueNorth. First, we developed repeatedly-used abstractions that span neural codes (such as binary, rate, population, and time-to-spike), long-range connectivity, and short-range connectivity. Second, we implemented ten algorithms that include convolution networks, spectral content estimators, liquid state machines, restricted Boltzmann machines, hidden Markov models, looming detection, temporal pattern matching, and various classifiers. Third, we demonstrate seven applications that include speaker recognition, music composer recognition, digit recognition, sequence prediction, collision avoidance, optical flow, and eye detection. Our results showcase the parallelism, versatility, rich connectivity, spatio-temporality, and multi-modality of the TrueNorth architecture as well as compositionality of the corelet programming paradigm and the flexibility of the underlying neuron model. Emmett McQuinn, Pallab Datta, Myron D. Flickner, William P. Risk, Dharmendra S. Modha, Theodore M. Wong, Raghavendra Singh, Steven K. Esser, and Rathinakumar Appuswamy. Connectivity of a Cognitive Computer Based on the Macaque Brain. Science, 339(6119), pp. 512–513, 1 February 2013 Robert Preissl, Theodore M. Wong, Pallab Datta, Raghavendra Singh, Steven Esser, William Risk, Horst Simon, Myron Flickner, and Dharmendra S. Modha. Compass: A Scalable Simulator for an Architecture for Cognitive Computing. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC 2012), November 2012 PDF (1.6 MB) Abstract Inspired by the function, power, and volume of the organic brain, we are developing TrueNorth, a novel modular, non-von Neumann, ultra-low power, compact architecture. TrueNorth consists of a scalable network of neurosynaptic cores, with each core containing neurons, dendrites, synapses, and axons. To set sail for TrueNorth, we developed Compass, a multi-threaded, massively parallel functional simulator and a parallel compiler that maps a network of long-distance pathways in the macaque monkey brain to TrueNorth. We demonstrate near-perfect weak scaling on a 16 rack IBM® Blue Gene®/Q (262144 CPUs, 256 TB memory), achieving an unprecedented scale of 256 million neurosynaptic cores containing 65 billion neurons and 16 trillion synapses running only 388× slower than real-time with an average spiking rate of 8.1 Hz. By using emerging PGAS communication primitives, we also demonstrate 2× better real-time performance over MPI primitives on a 4 rack Blue Gene/P (16384 CPUs, 16 TB memory). David M. LoBosco, Glen E. Cameron, Richard A. Golding, and Theodore M. Wong. The Pleiades fractionated space system architecture and the future of national security space . In Proceedings of the AIAA SPACE 2008 Conference, September 2008 PDF (992 KB) Abstract Our paper is the first in a series of publications to document the development of a fractionated space system. We explain the derivation of the technical approach for system development and present the preliminary architecture. We will publish future papers following the PDR, CDR, and flight demonstration. Tim Kaldewey, Theodore M. Wong, Richard Golding, Anna Povzner, Scott Brandt, and Carlos Maltzahn. Virtualizing disk performance . In Proceedings of the 14th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS 2008), April 2008 (Best student paper) PDF (256 KB) Abstract Large- and small-scale storage systems frequently serve a mixture of workloads, an increasing number of which require some form of performance guarantee. Providing guaranteed disk performance—the equivalent of a “virtual disk”—is challenging because disk requests are non-preemptible and their execution times are stateful, partially non-deterministic, and can vary by orders of magnitude. Guaranteeing throughput, the standard measure of disk performance, requires worst-case I/O time assumptions orders of magnitude greater than average I/O times, with correspondingly low performance and poor control of the resource allocation. We show that disk time utilization—analogous to CPU utilization in CPU scheduling and the only fully provisionable aspect of disk performance—yields greater control, more efficient use of disk resources, and better isolation between request streams than bandwidth or I/O rate when used as the basis for disk reservation and scheduling. Anna Povzner, Tim Kaldewey, Scott Brandt, Richard Golding, Theodore M. Wong, and Carlos Maltzahn. Efficient guaranteed disk request scheduling with Fahrrad . In Proceedings of the ACM SIGOPS/EuroSys European Conference on Computer Systems 2008 (EuroSys 2008), April 2008 PDF (400 KB) Abstract Guaranteed I/O performance is needed for a variety of applications ranging from real-time data collection to desktop multimedia to large-scale scientific simulations. Reservations on throughput, the standard measure of disk performance, fail to effectively manage disk performance due to the orders of magnitude difference between best-, average-, and worst-case response times, allowing reservation of less than 0.01% of the achievable bandwidth. We show that by reserving disk resources in terms of utilization, it is possible to create a disk scheduler that supports reservation of nearly 100% of the disk resources, provides arbitrarily hard or soft guarantees depending upon application needs, and yields efficiency as good or better than best-effort disk schedulers tuned for performance. We present the architecture of our scheduler, prove the correctness of its algorithms, and provide results demonstrating its effectiveness. David O. Bigelow, Suresh Iyer, Tim Kaldewey, Roberto C. Pineiro, Anna Povzner, Scott A. Brandt, Richard A. Golding, Theodore M. Wong, and Carlos Maltzahn. End-to-end performance management for scalable distributed storage . In Proceedings of the Petascale Data Storage Workshop, November 2007 PDF (130 KB) Abstract Many applications— for example, scientific simulation, real-time data acquisition, and distributed reservation systems— have I/O performance requirements, yet most large, distributed storage systems lack the ability to guarantee I/O performance. We are working on end-to-end performance management in scalable, distributed storage systems. The kinds of storage systems we are targeting include large high-performance computing (HPC) clusters, which require both large data volumes and high I/O rates, as well as large-scale general-purpose storage systems. Richard A. Golding and Theodore M. Wong. Walking toward moving goalposts: agile management for evolving systems . In Proceedings of the First Workshop on Hot Topics in Autonomic Computing (HotAC I), June 2006 PDF (136 KB) Abstract Much of the practical work in the autonomic management of storage systems has taken the “bolt-on” approach: take existing systems and add a separate management system on the side. While this approach can improve legacy systems, it has several problems, including scaling to heterogeneous and large systems and maintaining consistency between the system and the management model. We argue for a different approach, where autonomic management is woven throughout a system, as in the K2 distributed storage system that we are implementing. This distributes responsibility for management operations over all nodes according to ability and security, and stores management state as part of the entities being managed. Decision algorithms set general configuration goals and then let many system components work in parallel to move toward the goals. Theodore M. Wong, Richard A. Golding, Caixue Lin, and Ralph A. Becker-Szendy. Zygaria: Storage performance as a managed resource . In Proceedings of the 12th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS 2006), April 2006 PDF (424 KB) Abstract Large-scale storage systems often hold data for multiple applications and users. A problem in such systems is isolating applications and users from each other to prevent their corresponding workloads from interacting in unexpected ways. Another is ensuring that each application receives an appropriate level of performance. As part of the solution to these problems, we have designed a hierarchical I/O scheduling algorithm to manage performance resources on an underlying storage device. Our algorithm uses a simple allocation abstraction: an application or user is associated with a corresponding pool of throughput, and manages throughput within its pool by opening sessions. The algorithm ensures that each pool and session receives at least a reserve rate of throughput and caps usage at a limit rate, using hierarchical token buckets and EDF I/O scheduling. Once it has fulfilled the reserves of all active sessions and pools, it shares unused throughput fairly among active sessions and pools such that they tend to receive the same amount. It thus combines deadline scheduling with proportional-style resource sharing in a novel way. We assume that the device performs its own low-level head scheduling, rather than modeling the device in detail. Our implementation shows the correctness of our algorithm, imposes little overhead on the system, and achieves throughput nearly equal to that of an unmanaged device. Winfried W. Wilcke et al. IBM Intelligent Bricks project—Petabytes and beyond . IBM Journal of Research and Development, 50(2/3), pp. 181–198, March–May 2006 PDF (682 KB) Abstract This paper provides an overview of the Intelligent Bricks project in progress at IBM Research. It describes common problems faced by data center operators and proposes a comprehensive solution based on brick architectures. Bricks are hardware building blocks. Because of certain properties, defined here, scalable and reliable systems can be built with collections of identical bricks. An important feature is that brick-based systems must survive the failure of any brick without requiring human intervention, as long as most bricks are operational. This simplifies system management and allows very dense and very scalable systems to be built. A prototype storage server in the form of a 3×3×3 array of bricks, capable of storing 26 TB, is operational at the IBM Almaden Research Center. It successfully demonstrates the concepts of the Intelligent Bricks architecture. The paper describes this implementation of brick architectures based on newly developed communication and cooling technologies, the software developed, and techniques for building very reliable systems from low-cost bricks, and it discusses the performance and the future of intelligent brick systems. Theodore M. Wong, Chenxi Wang, and Jeannette M. Wing. Verifiable secret redistribution for archive systems . In Proceedings of the First International IEEE Security in Storage Workshop (SISW 2002), December 2002 PDF (424 KB) Abstract We present a new verifiable secret redistribution protocol for threshold sharing schemes that forms a key component of a proposed archival storage system. Our protocol supports redistribution from (m,n) to (m',n') threshold sharing schemes without requiring reconstruction of the original data. The design is motivated by archive systems for which the added security of threshold sharing of data must be accompanied by the flexibility of dynamic shareholder changes. Our protocol enables the dynamic addition or removal of shareholders, and also guards against mobile adversaries. We observe that existing protocols either cannot be extended readily to allow redistribution between different access structures, or have vulnerabilities that allow faulty old shareholders to distribute invalid shares to new shareholders. Our primary contribution is that in our protocol, new shareholders can verify the validity of their shares after redistribution between different access structures. Theodore M. Wong and John Wilkes. My cache or yours? Making storage more exclusive . In Proceedings of the USENIX Annual Technical Conference (USENIX 2002), June 2002, pp. 161-175 PDF (264 KB) Abstract Modern high-end disk arrays often have several gigabytes of cache RAM. Unfortunately, most array caches use management policies which duplicate the same data blocks at both the client and array levels of the cache hierarchy: they are inclusive. Thus, the aggregate cache behaves as if it was only as big as the larger of the client and array caches, instead of as large as the sum of the two. Inclusiveness is wasteful: cache RAM is expensive. We explore the benefits of a simple scheme to achieve exclusive caching, in which a data block is cached at either a client or the disk array, but not both. Exclusiveness helps to create the effect of a single, large unified cache. We introduce a DEMOTE operation to transfer data ejected from the client to the array, and explore its effectiveness with simulation studies. We quantify the benefits and overheads of demotions across both synthetic and real-life workloads. The results show that we can obtain useful (sometimes substantial) speedups. During our investigation, we also developed some new cache-insertion algorithms that show promise for multi-client systems, and report on some of their properties.
Notable technical reports	Boyoung Yoo (Quinn), Xi (Iris) Yang, Teresa Filshtein Sonmez, Jianan Zhan, Stacey B. Detweiler, James R. Ashenhurst, Alison L. Chubb, Sahil Hansalia, Akele K. Reed, Theodore M. Wong, Ryanne R. Wu, Bertram L. Koelsch, Anna Guan, Sarah B. Laskey, Michael V. Holmes. Development and testing of estimating Biological Age from blood based signatures . 23andMe White Paper 23-26, May 2024 PDF (4.6 MB) Abstract Aging is defined as the gradual functional and structural decline of an organism. It is considered a major risk factor for adverse health outcomes and mortality. While chronological time is a strong indicator of the aging process, it does not perfectly predict the underlying physiological state or biological age of an individual. Each individual with the same chronological age (CA) is at a different risk for developing aging-related diseases and mortality, primarily due to their varying lifestyles, genetic compositions, and biological aging processes. Over the past few decades, considerable efforts have been made to develop an informative measure of biological aging. Such a measurement should meet the following criteria as set out by the American Federation for Aging Research: (1) predict remaining life expectancy and disease-specific mortality better than CA, (2) monitor mechanisms underlying the aging process but not a specific disease, (3) be subject to repeated tests without harming the individual nor affecting life expectancy, and (4) be testable in both laboratory animals and humans. affecting life expectancy, and (4) be testable in both laboratory animals and humans. Blood biomarkers (e.g., total cholesterol or platelet count) are often used to assess the overall health status of an individual in a clinical setting. Blood biomarker-based biological age (BBA) combines these blood biomarkers into a single value to facilitate interpretation of biological aging and better predict lifespan than CA. One of the leading BBA methods is Morgan Levine's PhenoAge (phenotypic age). Levine et al. used the National Health and Nutrition Examination Survey (NHANES) dataset to fit a model to estimate BBA using ten predictor variables: albumin, alkaline phosphatase, creatinine, C-reactive protein, glucose, lymphocyte percentage, mean corpuscular volume, red blood cell distribution width, white blood cell count, and CA. After adjusting for CA, the authors reported that a one-year increase in PhenoAge was associated with a 9% increase in mortality risk. In 2023, 23andMe, Inc. started offering blood biomarker testing products, which opened up an opportunity to provide a measure of BBA to our customers. In this White Paper, we detail the modeling methodologies and validating procedures used to create the BBA model powering the 23andMe's Biological Age feature. We describe how we expanded on elements of PhenoAge to build 23andMe's Biological Age, a survival model trained on a large cohort from the UK Biobank (UKB). Since a larger set of blood biomarkers are available in 23andMe's blood biomarker panel and UKB than in NHANES, we developed our own model so that we could provide our own feature engineering and train the model on a larger cohort. The model was then validated on several ethnically-diverse subsets from UKB to show that the model performance is consistent across different ancestries. To further ensure our model's performance outside the somewhat limited CA range and ancestral background of the UKB participants, we used the NHANES as a validation set. We further computed the effect size of each biomarker per individual to provide a customized explanation of which biomarkers contribute to an increase or decrease of their BBA. We show that BBA is a better indicator of mortality risk and aging-related events than CA, particularly for cardiovascular and metabolic diseases. We also included a sex specific analysis to show that our result is consistent for both male and female participants (Supplementary Material). As an example of the dynamic nature of BBA, we include a longitudinal case study of an individual to demonstrate BBA's ability to reflect underlying changes to the physiological processes arising from environmental change over time. James R. Ashenhurst, Sarah B. Laskey, Jianan Zhan Aditya Ambati, Rafaela Bagur Quetglas, Anna Guan, Jingran Wen, Peter Wilton, Iris Yang, Bo Yoo, Sahil Hansalia, Yashashree Kokje, Ishana Raghuram, Akele Reed, Marcos Torres, 23andMe Research Team, Subarna Sinha, Theodore M. Wong, Bertram L. Koelsch. A Generalized Method for the Creation and Evaluation of Polygenic Scores: Accounting for Genetic Ancestry as a Continuum . 23andMe White Paper 23-25, December 2023 PDF (699 KB) Abstract Polygenic scores (PGS) strive to estimate the heritable portion of risk for many common diseases and other traits. Genome-wide association studies (GWAS) frequently identify multiple genetic variants with small to moderate individual impact on risk for a condition; many of these variants are commonly single nucleotide polymorphisms (SNPs). To quantify the cumulative impact of these variants on risk, machine learning methods are used to construct statistical models that generate polygenic scores. Recently, advances in modeling methodology have enabled massive increases in the number of genetic variants that can be included in polygenic models, leading to corresponding increases in the proportion of trait variance that these models explain. As a result, PGS are now being used to estimate heritable risk for a wide range of conditions and research is ongoing to evaluate their potential utility as part of clinical decision making. The key factor that limits researchers' ability to create large polygenic models is the size of the training cohort. Very large sample sizes are necessary both to identify genetic variants associated with a disease and to estimate their joint contribution to risk. Additionally, obtaining samples from diverse populations is necessary to create models that are calibrated to these populations, whether by assessing how well a model developed using data from a particular ancestry group (usually European) generalizes to other (often non-European) groups, or by developing models using data from various populations themselves. With over 14 million kits sold and approximately 80% of customers—including customers of many different ancestries—consenting to participate in research, 23andMe has a unique ability to develop large PGS that predict a wide range of health conditions and traits and to optimize and assess PGS performance across people with diverse ancestral backgrounds. Analyses of the company's genetic and self-reported health data show that we can replicate GWAS on clinically collected health information. Over the last several years, 23andMe has used PGS as the basis of dozens of customer reports on topics ranging from the ability to match a musical pitch to the likelihood of developing type 2 diabetes. Here we detail the modeling methodologies and evaluation procedures used to create the PGS behind new 23andMe Health Predisposition and Wellness reports on common health conditions. The Appendices to this White Paper further summarize the performance and characteristics of each PGS used in recently released reports (starting in 2024). We intend for this White Paper and the Appendices to be living documents that will be updated as methodologies change and new PGS-based genetic reports are released. A change log is provided at the bottom of this document to describe significant updates. Theodore M. Wong, Robert Preissl, Pallab Datta, Myron Flickner, Raghavendra Singh, Steven K. Esser, Emmett McQuinn, Rathinakumar Appuswamy, William P. Risk, Horst D. Simon, Dharmendra S. Modha. 10¹⁴. IBM Technical Paper RJ10502, November 2012 PDF (221 KB) Abstract Since the final submission of our work on the Compass scalable simulator for the IBM TrueNorth Cognitive Computing architecture, we have simulated an unprecedented 2.084 billion neurosynaptic cores containing 53×10¹⁰ neurons and 1.37×10¹⁴ synapses running at only 1542× slower than real time. We attained this scale by using the Sequoia 96-rack IBM® Blue Gene®/Q supercomputer at Lawrence Livermore National Labs. By comparison, the ultimate vision of the DARPA SyNAPSE program is to build a cognitive computing architecture with 10¹⁰ neurons and 10¹⁴ synapses, inspired by the following: Shepherd estimates the number of synapses in the human brain as 0.6×10¹⁴, and Koch estimates the number of synapses in the human brain as 2.4×10¹⁴. It is important to clarify that we have not built a biologically realistic simulation of the complete human brain. Rather, we have simulated a novel modular, scalable, non-von Neumann, ultra-low power, cognitive computing architecture at the DARPA SyNAPSE metric of 10¹⁴ synapses that itself is inspired by the number of synapses in the human brain. We mathematically abstract away computation (“neurons”), memory (“synapses”), and communication (“axons”, “dendrites”) from biological detail towards engineering goals of maximizing function (utility, applications) and minimizing hardware cost (power, area, delay) and design complexity. Theodore M. Wong, Richard A. Golding, Harvey M. Ruback, Wilfred Plouffe, and Scott A. Brandt. The Virtual Mission Bus . IBM Technical Paper RJ10472, September 2010 PDF (257 KB) Abstract Distributed, adaptive, hard real-time applications, such as process control or guidance systems, have requirements that go beyond those of traditional real-time systems: accommodation of a dynamic set of applications, autonomous adaptation as application requirements and system resources change, and security between applications from different organization. Developers need a middleware with features that support developing and running these applications, especially as commercial and defense systems become more "network-centric". The Virtual Mission Bus (VMB) middleware, targeted at both distributed IT systems and real-time systems, provides the essential basic services to support these applications and the tools for building more complex services, all while keeping the middleware kernel minimal enough for embedded system use. We successfully used the VMB to prototype a distributed spacecraft cluster system. Theodore M. Wong, Richard A. Golding, Joseph S. Glider, Elizabeth Borowsky, Ralph A. Becker-Szendy, Claudio Fleiner, Deepak R. Kenchammana-Hosekote, and Omer A. Zaki. Kybos: Self-management for distributed brick-based storage . IBM Technical Paper RJ10356, August 2005 PDF (197 KB) Abstract Current tools for storage system configuration management make offline decisions, recovering from, instead of preventing, performance specification violations. The consequences are severe in a large-scale system that requires complex actions to recover from failures, and can result in a temporary shutdown of the system. We introduce Kybos, a distributed storage system that makes online, autonomous responses to system changes. It runs on clusters of intelligent bricks, which provide local enforcement of global performance and reliability specifications and so isolate both recovery and application IO traffic. A management agent within Kybos translates simple, high-level specifications into brick-level enforcement targets, invoking centralized algorithms only when taking actions that require global state. Our initial implementation shows that this approach works well. Theodore M. Wong. Decentralized recovery for survivable storage systems . PhD dissertation (Technical Report CMU-CS-04-119), School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, May 2004 PDF (714 KB) PostScript (1642 KB) Abstract Modern society has produced a wealth of data to preserve for the long term. Some data we keep for cultural benefit, in order to make it available to future generations, while other data we keep because of legal imperatives. One way to preserve such data is to store it using survivable storage systems. Survivable storage is distinct from reliable storage in that it tolerates confidentiality failures in which unauthorized users compromise component storage servers, as well as crash failures of servers. Thus, a survivable storage system can guarantee both the availability and the confidentiality of stored data. Research into survivable storage systems investigates the use of m-of-n threshold sharing schemes to distribute data to servers, in which each server receives a share of the data. Any m shares can be used to reconstruct the data, but any m - 1 shares reveal no information about the data. The central thesis of this dissertation is that to truly preserve data for the long term, a system that uses threshold schemes must incorporate recovery protocols able to overcome server failures, adapt to changing availability or confidentiality requirements, and operate in a decentralized manner. To support the thesis, I present the design and experimental performance analysis of a verifiable secret redistribution protocol for threshold sharing schemes. The protocol redistributes shares of data from old to new, possibly disjoint, sets of servers, such that new shares generated by redistribution cannot be combined with old shares to reconstruct the original data. The protocol is decentralized, and does not require intermediate reconstruction of the data; thus, it does not introduce a central point of failure or risk the exposure of the data during execution. The protocol incorporates a verification capability that enables new servers to confirm that their shares can be used to reconstruct the original data. Theodore M. Wong and Jeannette M. Wing. Verifiable secret redistribution . Technical Report CMU-CS-01-155, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, October 2001 PDF (176 KB) PostScript (204 KB)
Patents	Nelly Fazio, Richard A. Golding, Theodore M. Wong. Method for authenticated communication in dynamic federated environments. US Patent 9,130,757 (8 September 2015) Details According to one embodiment of the present invention, a method for protecting authenticated communication in dynamic federated environments is provided. The method includes distributing shares of a private signature key to a group of users. When switching from an existing to a new group of users, the method includes producing a plurality of sub-shares from each of the distributed shares of existing users, with each sub-share being accompanied by a corresponding validity proof. The sub-shares from multiple existing users are combined to generate a set of shares for new users, with each new share being derived from sub-shares from multiple existing users. Steven K. Esser, Dharmendra S. Modha, Theodore M. Wong. Cortical simulator for object-oriented simulation of a neural network. US Patent 9,130,757 (28 April 2015) Details Embodiments of the invention relate to a function-level simulator for modeling a neurosynaptic chip. One embodiment comprises simulating a neural network using an object-oriented framework including a plurality of object-oriented classes. Each class corresponds to a component of a neural network. Running a simulation model of the neural network includes instantiating multiple simulation objects from the classes. Each simulation object is an instance of one of the classes. Scott A. Brandt, Richard A. Golding, Theodore M. Wong. Integration of dissimilar job types into an earliest deadline first (EDF) schedule. US Patent 8,607,240 (10 December 2013) Details A method for implementation within a scheduler for a processor is adescribed. The method includes receiving a plurality of jobs from an earliest deadline first (EDF) schedule, wherein the scheduler implements an EDF scheduling model. The method also includes receiving a separate job from a source other than the EDF schedule. The separate job has a fixed scheduling requirement with a specific execution time. The method also includes determining an amount of available utilization capacity of the processor and inserting the separate job into an execution plan of the processor with the plurality of jobs from the EDF schedule in response to a determination that the available utilization capacity of the processor is sufficient to execute the separate job according to the fixed scheduling requirement associated with the separate job. Ralph A. Becker-Szendy, Richard A. Golding, Caixue Lin, Theodore M. Wong, and Omer A. Zaki. System and method for managing storage system performance as a resource. US Patent 7,962,563 (14 June 2011) Details A scheduler selects an I/O from a session of a pool and updates token buckets associated with resource limits and reserves for the session and the pool and statistics used in determining fair sharing. To select an I/O, the scheduler identifies sessions with a non-empty queue, identifies head I/Os in the queues, computes for the head I/O a deadline using session and pool reserve buckets and a release time using session and pool limit buckets, and selects a head I/O with an earliest deadline that is past the release time. If the deadline of the selected candidate head I/O is in the past, the scheduler transfers the selected head I/O to the tail of the storage device queue. Otherwise, the scheduler selects the pool with the least amount of I/O traffic according to a session fair share estimator. Richard A. Golding, Theodore M. Wong, and Omer A. Zaki. Computer program and method for managing resources in a distributed storage system. US Patent 7,694,082 (6 April 2010) Details A resource management system uses a virtual resource pool distributed across a set of storage devices to manage resources in a distributed storage system. The storage devices dedicate a resource in an allocation pool available to the virtual resource pool. The virtual resource pool is a virtual storage server in which an application receives at least a predetermined amount of storage capacity, a predetermined level of performance, or a predetermined reliability, represented by tokens. A virtual resource pool manager gives the tokens to an application. The application transmits the tokens along with the commands to the storage device. The token represents a right to consume up to some limit of resources on a specific storage device in a computing session. The storage device locally tracks resource consumption through the token. John Wilkes and Theodore M. Wong. Exclusive caching in computer systems. US Patent 6,851,024 (1 February 2005) Details A computer system with mechanisms for exclusive caching that avoids the accumulation of duplicate copies of information in host and storage system caches. A computer system according to these exclusive caching techniques includes a host system having a host cache and a storage system having a storage system cache and functionality for performing demote operations to coordinate the placement of information in the host cache to the storage system caches. John Wilkes and Theodore M. Wong. Adaptive data insertion for caching. US Patent 6,728,837 (27 April 2004) Details A computer system cache monitors the effectiveness of data inserted into a cache by one or more sources to determine which source should receive preferential treatment when updating the cache. The cache may be part of a computer system that includes a plurality of host systems; each host system includes a host cache, connected to a storage system having a storage system cache. Ghost caches are used to record hits from the plurality of host systems performing operations for storing and retrieving data from the storage system cache. The storage system cache includes a cache controller that is operable to calculate a merit figure and determine an insertion point in a queue associated with the storage system cache based on the merit figure. The merit figure is calculated using a weighting algorithm for weighting hits from the plurality of sources recorded in the ghost caches.

	HTML without this header
	PDF without this header and without project or publication abstracts

Resume

Theodore M. Wong, PhD 黃明道