Research


I am a Research Staff Member at the IBM Almaden Research Center, which I joined in 2003 after completing my PhD at Carnegie Mellon University. I'm currently one of the technical leaders for the Pleiades project, which is a DARPA-funded effort to demonstrate that a traditional, large, monolithic satellite can be replaced by a group of smaller, individually launched, wirelessly networked and cluster-flown spacecraft modules. My focus within Pleiades is on the middleware for coordinating distributed software applications running across the spacecraft. In general, my research interests include: lightweight distributed consistency control, secure group membership protocols, and algorithms for automatic resource reservation and management.

Research projects I am working on (or have worked on), in descending chronological order, include:

You will need Adobe Reader to read the Portable Document Format (PDF) files.


Current Research

The Pleiades architecture for DARPA System F6 (Future, fast, flexible, fractionated, free-flying spacecraft united by information exchange)

The DARPA System F6 program intends to demonstrate that a traditional, large, monolithic satellite can be replaced by a group of smaller, individually launched, wirelessly networked and cluster-flown spacecraft modules. Each ‘fractionated’ module can contribute a unique capability to the rest of the network, such as computing, ground communications, or payload functionality. The ultimate goal of the program is to launch a fractionated spacecraft system and demonstrate it in orbit in approximately four years.

Orbital Sciences Corporation, teamed with IBM, Jet Propulsion Laboratory, Georgia Institute of Technology, SpaceDev, and Aurora Flight Sciences, has received an award for the first phase of System F6 to:

  • Develop key technologies to enable the fractionated approach, including robust networking, reliable wireless communications, fault-tolerant distributed computing, wireless power transfer, and autonomous cluster navigation

  • Select a space system mission of value to a national security space stakeholder and develop a system design to accomplish that mission

  • Develop an innovative analytical approach using econometric tools that determine the risk-adjusted cost and value of a both a fractionated space system and a monolithic program of record with equivalent capability

  • Develop an evolved hardware-in-the-loop test-bed to emulate the designed fractionated spacecraft using a cluster of networked computers.

Publications include:

  • David M. LoBosco, Glen E. Cameron, Richard A. Golding, and Theodore M. Wong. The Pleiades fractionated space system architecture and the future of national security space. In Proceedings of the AIAA SPACE 2008 Conference, September 2008

    [PDF] Acrobat PDF (992 KB)

End-to-end performance management for large distributed storage

Storage systems for large and distributed clusters of compute servers are themselves large and distributed. Their complexity and scale makes it hard to manage these systems, and in particular they make it hard to ensure that applications using them get good, predictable performance. At the same time, shared access to the system from multiple applications, users, and competition from internal system activities leads to a need for predictable performance.

The storage quality-of-service project at the UCSC Storage Systems Research Center investigates mechanisms for improving storage system performance in large distributed storage systems through mechanisms that integrate the performance aspects of the path that I/O operations take through the system, from the application interface on the compute server, through the network, to the storage servers. We focus on five parts of the I/O path in a distributed storage system: I/O scheduling at the storage server, storage server cache management, client-to-server network flow control, client-to-server connection management, and client cache management.

Publications:

  • Tim Kaldewey, Theodore M. Wong, Richard Golding, Anna Povzner, Scott Brandt, and Carlos Maltzahn. Virtualizing disk performance. In Proceedings of the 14th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS 2008), April 2008 (Best student paper)

    [PDF] Acrobat PDF (256 KB)

  • Anna Povzner, Tim Kaldewey, Scott Brandt, Richard Golding, Theodore M. Wong, and Carlos Maltzahn. Efficient guaranteed disk request scheduling with Fahrrad. In Proceedings of the ACM SIGOPS/EuroSys European Conference on Computer Systems 2008 (EuroSys 2008), April 2008

    [PDF] Acrobat PDF (400 KB)

  • David O. Bigelow, Suresh Iyer, Tim Kaldewey, Roberto C. Pineiro, Anna Povzner, Scott A. Brandt, Richard A. Golding, Theodore M. Wong, and Carlos Maltzahn. End-to-end performance management for scalable distributed storage. In Proceedings of the Petascale Data Storage Workshop, November 2007

    [PDF] Acrobat PDF (130 KB)

  • Theodore M. Wong, Richard A. Golding, Caixue Lin, and Ralph A. Becker-Szendy. Zygaria: Storage performance as a managed resource. In Proceedings of the 12th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS 2006), April 2006

    [PDF] Acrobat PDF (424 KB)

Previous Research

Self-managing heterogeneous storage systems

The growth in the amount of data being stored and manipulated for commercial, scientific, and intelligence applications is worsening the manageability and reliability of data storage systems. The expansion of such large-scale storage systems into petabyte capacities puts pressure on cost, leading to systems built out of many cheap but relatively unreliable commodity storage servers. These systems are expensive and difficult to manage—current figures show that management and operation costs are often several times purchase cost—partly because of the number of components to configure and monitor, and partly because system management actions often have unexpected, system-wide side effects. Also, these systems are vulnerable to attack because they have many entry points, and because there are no mechanisms to contain the effects either of attacks or of subsystem failures.

Kybos is a distributed storage system that addresses these issues. It will provide manageable, available, reliable, and secure storage for large data collections, including data that is distributed over multiple geographical sites. Kybos is self-managing, which reduces the cost of administration by eliminating complex management operations and simplifying the model by which administrators configure and monitor the system. Kybos stores data redundantly across multiple commodity storage servers, so that the failure of any one server does not compromise data. Finally, Kybos is built as a loosely-coupled federation of servers, so that the compromise or failure of some servers will not impede remaining servers from continuing to take collective action toward system goals.

Our primary application is the self-management of federated (but potentially unreliable) clusters of storage servers, but we anticipate that the algorithms we have developed (and will implement) will have broad applicability to the general class of problems involving the coordination of independent autonomous agents with a collective set of mission goals.

Publications:

  • Richard A. Golding and Theodore M. Wong. Walking toward moving goalposts: agile management for evolving systems. In Proceedings of the First Workshop on Hot Topics in Autonomic Computing (HotAC I), June 2006

    [PDF] Acrobat PDF (136 KB)

  • W. W. Wilcke, R. B. Garner, C. Fleiner, R. F. Freitas, R. A. Golding, J. S. Glider, D. R. Kenchammana-Hosekote, J. L. Hafner, K. M. Mohiuddin, KK Rao, R. A. Becker-Szendy, T. M. Wong, O. A. Zaki, M. Hernandez, K. R. Fernandez, H. Huels, H. Lenk, K. Smolin, M. Ries, C. Goettert, T. Picunko, B. J. Rubin, H. Kahn, and T. Loo. IBM Intelligent Bricks project—Petabytes and beyond. IBM Journal of Research and Development, 50(2/3), pp. 181–198, March–May 2006

  • Theodore M. Wong, Richard A. Golding, Joseph S. Glider, Elizabeth Borowsky, Ralph A. Becker-Szendy, Claudio Fleiner, Deepak R. Kenchammana-Hosekote, and Omer A. Zaki. Kybos: Self-management for distributed brick-based storage. IBM Technical Paper RJ10356, August 2005

    [PDF] Acrobat PDF (208 KB)

Decentralized recovery for survivable storage systems

Modern society has produced a wealth of data to preserve for the long term. Some data we keep for cultural benefit, in order to make it available to future generations, while other data we keep because of legal imperatives. One way to preserve such data is to store it using survivable storage systems. Survivable storage is distinct from reliable storage in that it tolerates confidentiality failures in which unauthorized users compromise component storage servers, as well as crash failures of servers. Thus, a survivable storage system can guarantee both the availability and the confidentiality of stored data.

Research into survivable storage systems investigates the use of m-of-n threshold sharing schemes to distribute data to servers, in which each server receives a share of the data. Any m shares can be used to reconstruct the data, but any m - 1 shares reveal no information about the data. The central thesis of this dissertation is that to truly preserve data for the long term, a system that uses threshold schemes must incorporate recovery protocols able to overcome server failures, adapt to changing availability or confidentiality requirements, and operate in a decentralized manner.

To support the thesis, I present the design and experimental performance analysis of a verifiable secret redistribution protocol for threshold sharing schemes. The protocol redistributes shares of data from old to new, possibly disjoint, sets of servers, such that new shares generated by redistribution cannot be combined with old shares to reconstruct the original data. The protocol is decentralized, and does not require intermediate reconstruction of the data; thus, it does not introduce a central point of failure or risk the exposure of the data during execution. The protocol incorporates a verification capability that enables new servers to confirm that their shares can be used to reconstruct the original data.

Publications:

  • Theodore M. Wong. Decentralized recovery for survivable storage systems. PhD dissertation (Technical Report CMU-CS-04-119), School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, May 2004

    [PDF] Acrobat PDF (714 KB)    [PostScript] PostScript (1642 KB)

  • Theodore M. Wong, Chenxi Wang, and Jeannette M. Wing. Verifiable secret redistribution for archive systems. In Proceedings of the First International IEEE Security in Storage Workshop (SISW 2002), December 2002

    [PDF] Acrobat PDF (424 KB)

  • Theodore M. Wong and Jeannette M. Wing. Verifiable secret redistribution. Technical Report CMU-CS-01-155, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, October 2001

    [PDF] Acrobat PDF (176 KB)    [PostScript] PostScript (204 KB)

Thesis committee:

Exclusive caching in hierarchical storage systems

[I began this research project while interning with the Storage Systems Program at Hewlett-Packard Labs.]

Modern high-end disk arrays often have several gigabytes of cache RAM. Unfortunately, most array caches use management policies which duplicate the same data blocks at both the client and array levels of the cache hierarchy: they are inclusive. Thus, the aggregate cache behaves as if it was only as big as the larger of the client and array caches, instead of as large as the sum of the two. Inclusiveness is wasteful: cache RAM is expensive.

We explore the benefits of a simple scheme to achieve exclusive caching, in which a data block is cached at either a client or the disk array, but not both. Exclusiveness helps to create the effect of a single, large unified cache. We introduce a DEMOTE operation to transfer data ejected from the client to the array, and explore its effectiveness with simulation studies. We quantify the benefits and overheads of demotions across both synthetic and real-life workloads. The results show that we can obtain useful (sometimes substantial) speedups.

During our investigation, we also developed some new cache-insertion algorithms that show promise for multi-client systems, and report on some of their properties.

Publications:

  • Theodore M. Wong and John Wilkes. My cache or yours? Making storage more exclusive. In Proceedings of the USENIX Annual Technical Conference, June 2002, pp. 161–175

    [PDF] Acrobat PDF (264 KB)

Patents:

  • John Wilkes and Theodore M. Wong. Exclusive caching in computer systems. United States Patent 6,851,024 (granted 1 February 2005)

  • John Wilkes and Theodore M. Wong. Adaptive data insertion for caching. United States Patent 6,728,837 (granted 27 April 2004)


Valid HTML 4.01! Theodore Wong
Last modified: Mon Aug 11 15:05:17 EDT 2008