Respawn: A Distributed Multi-Resolution Time-Series Datastore

As sensor networks gain traction and begin to scale, we will be increasingly faced with challenges associated with managing large-scale time-series data. Respawn is a cloud-to-edge partitioned architecture that is capable of serving large amounts of time-series data from a continuously updating datastore with access latencies low enough to support interactive real-time visualization. Respawn targets sensing systems where resource-constrained edge node devices may only have limited or intermittent network connections linking them to a cloud-backend. The cloud-backend provides aggregate storage and transparent dispatching of data queries to edge node devices. Data is downsampled as it enters the system creating a multi-resolution representation capable of low-latency range-base queries. Lower-resolution aggregate data is automatically migrated from edge nodes to the cloud-backend both for improved consistency and caching. In order to further mask latency from users, edge nodes automatically identify and migrate blocks of data that contain statistically interesting features. Respawn is able to run on ARM-based edge node devices connected to a cloud-backend with the ability to serve thousands of clients and terabytes of data with sub-second latencies.

Respawn is a distributed time-series datastore designed to manage hundreds of thousands of sensor feeds while providing range-based queries at sub-second latencies from resource-constrained devices. To achieve this, Respawn leverages two concepts: cloud-to-edge partitioning and multi-resolution storage. In Respawn’s distributed architecture, communication is load balanced between a few server-class machines and many inexpensive, low-end embedded edge devices. This is achieved by partitioning the data between the cloud and the edge, storing the low-resolution aggregate data in cloud nodes and the high-resolution data at field-deployed edge nodes. A dispatcher front-end is responsible for directing queries between the cloud and the edge and, in effect, maintaining low request latencies. The dispatcher is designed to operate predominantly out of memory to support tens of thousands of concurrent connections.

Since edge nodes typically have latencies of almost an order of magnitude greater than the cloud, selective data migration can be used to further improve latency. A Quality- of-Service (QoS) parameter at each gateway node is used to determine how much bandwidth is available for data and hence how aggressively tiles can be migrated. Top-level aggregate tiles are migrated first since these are typically used as starting points for range-based queries. Low-level (higher resolution) tiles are migrated based on both client access patterns and based on data metrics like standard deviation. Standard deviation is one of many metrics that can be used to pinpoint tiles that would be of more interest to clients.

Publications

Maxim Buevich, Niranjini Rajagopal, Anthony Rowe, “Hardware Assisted Clock Synchronization for Real-Time Sensor Networks”, IEEE Real-Time Systems Symposium (RTSS 2013), Vancouver, CA 2013. (pdf)

respawn-1.png (46.3 KB) Anthony Rowe, 06/18/2014 01:34 am

respawn-2.png (54.5 KB) Anthony Rowe, 06/18/2014 01:37 am