Research


I am a Research Staff Member at the IBM Almaden Research Center, which I joined in 2003 after completing my PhD at Carnegie Mellon University. I'm currently one of the technical leaders for Virtual Mission Bus (VMB) project, which is an effort to develop a middleware system for supporting distributed, adaptive, hard real-time applications. Such applications have requirements that go beyond those of traditional real-time systems: accommodation of a dynamic set of applications, autonomous adaptation as application requirements and system resources change, and security between applications from different organization. The VMB provides the essential basic services to support these applications and the tools for building more complex services, all while while keeping the middleware kernel minimal enough for embedded system use.

In general, my research interests include: lightweight distributed consistency control, secure group membership protocols, and algorithms for automatic resource reservation and management. Research projects I am working on (or have worked on), in descending chronological order, include:

You will need Adobe Reader to read the Portable Document Format (PDF) files.


Current Research

The Virtual Mission Bus middleware system, as part of the Pleiades architecture for DARPA System F6

Distributed, adaptive, hard real-time applications, such as process control or guidance systems, have requirements that go beyond those of traditional real-time systems: accommodation of a dynamic set of applications, autonomous adaptation as application requirements and system resources change, and security between applications from different organization. Developers need a middleware with features that support developing and running these applications, especially as commercial and defense systems become more network-centric. The Virtual Mission Bus (VMB) middleware, targeted at both distributed IT systems and real-time systems, provides the essential basic services to support these applications and the tools for building more complex services, all while while keeping the middleware kernel minimal enough for embedded system use. We successfully used the VMB to prototype a distributed spacecraft cluster system.

Publications include:

End-to-end performance management for large distributed storage

Storage systems for large and distributed clusters of compute servers are themselves large and distributed. Their complexity and scale makes it hard to manage these systems, and in particular they make it hard to ensure that applications using them get good, predictable performance. At the same time, shared access to the system from multiple applications, users, and competition from internal system activities leads to a need for predictable performance.

The storage quality-of-service project at the UCSC Storage Systems Research Center investigates mechanisms for improving storage system performance in large distributed storage systems through mechanisms that integrate the performance aspects of the path that I/O operations take through the system, from the application interface on the compute server, through the network, to the storage servers. We focus on five parts of the I/O path in a distributed storage system: I/O scheduling at the storage server, storage server cache management, client-to-server network flow control, client-to-server connection management, and client cache management.

Publications:

  • Tim Kaldewey, Theodore M. Wong, Richard Golding, Anna Povzner, Scott Brandt, and Carlos Maltzahn. Virtualizing disk performance. In Proceedings of the 14th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS 2008), April 2008 (Best student paper)

    [PDF] Acrobat PDF (256 KB)

  • Anna Povzner, Tim Kaldewey, Scott Brandt, Richard Golding, Theodore M. Wong, and Carlos Maltzahn. Efficient guaranteed disk request scheduling with Fahrrad. In Proceedings of the ACM SIGOPS/EuroSys European Conference on Computer Systems 2008 (EuroSys 2008), April 2008

    [PDF] Acrobat PDF (400 KB)

  • David O. Bigelow, Suresh Iyer, Tim Kaldewey, Roberto C. Pineiro, Anna Povzner, Scott A. Brandt, Richard A. Golding, Theodore M. Wong, and Carlos Maltzahn. End-to-end performance management for scalable distributed storage. In Proceedings of the Petascale Data Storage Workshop, November 2007

    [PDF] Acrobat PDF (130 KB)

  • Theodore M. Wong, Richard A. Golding, Caixue Lin, and Ralph A. Becker-Szendy. Zygaria: Storage performance as a managed resource. In Proceedings of the 12th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS 2006), April 2006

    [PDF] Acrobat PDF (424 KB)

Previous Research

Self-managing heterogeneous storage systems

The growth in the amount of data being stored and manipulated for commercial, scientific, and intelligence applications is worsening the manageability and reliability of data storage systems. The expansion of such large-scale storage systems into petabyte capacities puts pressure on cost, leading to systems built out of many cheap but relatively unreliable commodity storage servers. These systems are expensive and difficult to manage—current figures show that management and operation costs are often several times purchase cost—partly because of the number of components to configure and monitor, and partly because system management actions often have unexpected, system-wide side effects. Also, these systems are vulnerable to attack because they have many entry points, and because there are no mechanisms to contain the effects either of attacks or of subsystem failures.

Kybos is a distributed storage system that addresses these issues. It will provide manageable, available, reliable, and secure storage for large data collections, including data that is distributed over multiple geographical sites. Kybos is self-managing, which reduces the cost of administration by eliminating complex management operations and simplifying the model by which administrators configure and monitor the system. Kybos stores data redundantly across multiple commodity storage servers, so that the failure of any one server does not compromise data. Finally, Kybos is built as a loosely-coupled federation of servers, so that the compromise or failure of some servers will not impede remaining servers from continuing to take collective action toward system goals.

Our primary application is the self-management of federated (but potentially unreliable) clusters of storage servers, but we anticipate that the algorithms we have developed (and will implement) will have broad applicability to the general class of problems involving the coordination of independent autonomous agents with a collective set of mission goals.

Publications:

  • Richard A. Golding and Theodore M. Wong. Walking toward moving goalposts: agile management for evolving systems. In Proceedings of the First Workshop on Hot Topics in Autonomic Computing (HotAC I), June 2006

    [PDF] Acrobat PDF (136 KB)

  • W. W. Wilcke, R. B. Garner, C. Fleiner, R. F. Freitas, R. A. Golding, J. S. Glider, D. R. Kenchammana-Hosekote, J. L. Hafner, K. M. Mohiuddin, KK Rao, R. A. Becker-Szendy, T. M. Wong, O. A. Zaki, M. Hernandez, K. R. Fernandez, H. Huels, H. Lenk, K. Smolin, M. Ries, C. Goettert, T. Picunko, B. J. Rubin, H. Kahn, and T. Loo. IBM Intelligent Bricks project—Petabytes and beyond. IBM Journal of Research and Development, 50(2/3), pp. 181–198, March–May 2006

  • Theodore M. Wong, Richard A. Golding, Joseph S. Glider, Elizabeth Borowsky, Ralph A. Becker-Szendy, Claudio Fleiner, Deepak R. Kenchammana-Hosekote, and Omer A. Zaki. Kybos: Self-management for distributed brick-based storage. IBM Technical Paper RJ10356, August 2005

    [PDF] Acrobat PDF (208 KB)

Decentralized recovery for survivable storage systems

Modern society has produced a wealth of data to preserve for the long term. Some data we keep for cultural benefit, in order to make it available to future generations, while other data we keep because of legal imperatives. One way to preserve such data is to store it using survivable storage systems. Survivable storage is distinct from reliable storage in that it tolerates confidentiality failures in which unauthorized users compromise component storage servers, as well as crash failures of servers. Thus, a survivable storage system can guarantee both the availability and the confidentiality of stored data.

Research into survivable storage systems investigates the use of m-of-n threshold sharing schemes to distribute data to servers, in which each server receives a share of the data. Any m shares can be used to reconstruct the data, but any m - 1 shares reveal no information about the data. The central thesis of this dissertation is that to truly preserve data for the long term, a system that uses threshold schemes must incorporate recovery protocols able to overcome server failures, adapt to changing availability or confidentiality requirements, and operate in a decentralized manner.

To support the thesis, I present the design and experimental performance analysis of a verifiable secret redistribution protocol for threshold sharing schemes. The protocol redistributes shares of data from old to new, possibly disjoint, sets of servers, such that new shares generated by redistribution cannot be combined with old shares to reconstruct the original data. The protocol is decentralized, and does not require intermediate reconstruction of the data; thus, it does not introduce a central point of failure or risk the exposure of the data during execution. The protocol incorporates a verification capability that enables new servers to confirm that their shares can be used to reconstruct the original data.

Publications:

  • Theodore M. Wong. Decentralized recovery for survivable storage systems. PhD dissertation (Technical Report CMU-CS-04-119), School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, May 2004

    [PDF] Acrobat PDF (714 KB)    [PostScript] PostScript (1642 KB)

  • Theodore M. Wong, Chenxi Wang, and Jeannette M. Wing. Verifiable secret redistribution for archive systems. In Proceedings of the First International IEEE Security in Storage Workshop (SISW 2002), December 2002

    [PDF] Acrobat PDF (424 KB)

  • Theodore M. Wong and Jeannette M. Wing. Verifiable secret redistribution. Technical Report CMU-CS-01-155, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, October 2001

    [PDF] Acrobat PDF (176 KB)    [PostScript] PostScript (204 KB)

Thesis committee:

Exclusive caching in hierarchical storage systems

[I began this research project while interning with the Storage Systems Program at Hewlett-Packard Labs.]

Modern high-end disk arrays often have several gigabytes of cache RAM. Unfortunately, most array caches use management policies which duplicate the same data blocks at both the client and array levels of the cache hierarchy: they are inclusive. Thus, the aggregate cache behaves as if it was only as big as the larger of the client and array caches, instead of as large as the sum of the two. Inclusiveness is wasteful: cache RAM is expensive.

We explore the benefits of a simple scheme to achieve exclusive caching, in which a data block is cached at either a client or the disk array, but not both. Exclusiveness helps to create the effect of a single, large unified cache. We introduce a DEMOTE operation to transfer data ejected from the client to the array, and explore its effectiveness with simulation studies. We quantify the benefits and overheads of demotions across both synthetic and real-life workloads. The results show that we can obtain useful (sometimes substantial) speedups.

During our investigation, we also developed some new cache-insertion algorithms that show promise for multi-client systems, and report on some of their properties.

Publications:

  • Theodore M. Wong and John Wilkes. My cache or yours? Making storage more exclusive. In Proceedings of the USENIX Annual Technical Conference, June 2002, pp. 161–175

    [PDF] Acrobat PDF (264 KB)

Patents:

  • John Wilkes and Theodore M. Wong. Exclusive caching in computer systems. United States Patent 6,851,024 (granted 1 February 2005)

  • John Wilkes and Theodore M. Wong. Adaptive data insertion for caching. United States Patent 6,728,837 (granted 27 April 2004)


[View Theodore Wong's profile on LinkedIn] Copyright © 2001–2009 Theodore Wong
Last modified: Thu Aug 13 20:00:12 EDT 2009