Flash on Servers...........









The original case for network storage lies around the benefits of aggregating disk which include more random IOPS and bandwidth as well as the management simplicity of concentrating data into a centralized place. It has been the case that a single application per host has somewhat limited ability to consume bandwidth.

Considering flash in a networked storage environment, it does not have the same benefits of sharing. Since there are enough IOPS in a flash drive to placate most apps, the equation changes. Is network storage fast when the network latency is noticeable compared to flash? We can do 5 SSDs to get 40,000 IOPS but will the latency be a problem?

We will see flash on compute servers. Servers can already accommodate flash and it is cheap enough with 250GB < 10% of the server cost.

A new tier in the storage hierarchy: Steve argues we will see storage split into two tiers: the IOPS tier and the capacity tier. The IOPS-Tier will offer the best $/IOPS with low latency and will have sufficient capacity for the active data in most typical enterprise systems. It will be the new “primary storage”. The Capacity-Tier will offer the best $/GB – Deduplication, compression, and good serial performance.

Lets outline three broad categories of Host-Based Flash usage. DAS (which I think means “Direct Access Storage but my notes failed me), Primary Storage Service, and Cache.

· Host-Base Flash: DAS: Single point of failure and so you would use normal backup or application based replication like Exchange or Oracle). You can have boot files, VM, temp files.

· Host-Based Flash: Primary Storage Service: File or Block. Requires mirroring or RAID to peers. Allows capacity sharing. Steve believes it is likely that Microsoft and VMWare are likely to compete here – the integration of this is complex. Making reliable primary storage is hard and this may impact the roles of folks in IT.

· Host-Based Flash: Cache –File or block. This will be a writethrough cache. If you do a write-back you will occasionally lose stuff. Using a cache can allow for a centralized data management (in a remote network storage server). This can help with keeping the current IT storage management roles by providing an automatic “tiering”.

So, data management spans the IOPS tier and the Capacity tier. You need to have automated data movement and global access. It is impractical to have manual placement at huge scale. It is going to be important for us to have a technology independent language to specify the desired properties for the data. Say what properties you want and not how to get them. New SLO (Service Level Objectives) for Max Latency, Bandwidth, Availability, etc.

In summary, the new host-based flash is the new IOPS-Tier. It will replace high-performance primary storage in the cloud. This IOPS-Tier s is not going to be economical as a secondary or as archival since the $/GB is higher than spinning media. The Capacity Tier will be implemented as network storage. It is optimized for the best $/GB. Data management between these two tiers will be important.

There was a lot of discussion around the SLO (Service Level Objectives). Steve indicated that their vision was not super-flexible set of knobs but more a small set of options for defining the data characteristics. Margo offered that one aspect of what was important was the “value” of different parts of data which can, in turn, tell the system how hard to work to make the data remain available.

Someone asked if there was any performance analysis for 3-tier storage (memory, IOPS-Tier, Capacity-Tier) over more traditional 2-tier storage (Memory and Disk). Steve answered that there had not been an explicit performance analysis.

Stonebraker observed that enterprises seem to like NAS/SAN but that the Cloud guys are using direct attached storage. Are the forward looking guys walking from NAS/SAN? Steve replied there are people very interested in NAS/SAN because it gives them a way to control the data and make servers stateless. Adding in the IOPS-Tier is a simple extension.

This was a fascinating and engaging presentation!

Popular posts from this blog

From zero to prod ETL in 30 minutes with Streamsets

Fun with R and why it's so cool

Quick and Easy Kubernetes with Minikube 🌯