Thursday, May 23, 2019

Dell EMC DSS 8440: A Dynamic Machine Learning Server

We are taking advantage of this year’s Dell Technologies World gathering to introduce Dell EMC’s latest machine learning offering to our customers. Our Extreme Scale Infrastructure (ESI) team, by design, is constantly pushing boundries to solve the most pressing problems in today’s large-scale data centers. With the increasing demand for machine learning solutions, we are excited to be announcing the DSS 8440 accelerator-optimized server, specifically designed for high performance machine learning training.



The Challenge


Data center workloads continue to evolve in challenging ways as the computing landscape responds to the rapid advancement of new technologies. The availability of massive amounts of data – both structured and unstructured – and the emergence of cloud native applications – with their demands for higher throughput and parallel computing – are driving data centers to look for more advanced processing solutions to incorporate into their existing infrastructures. In particular, they are looking for accelerator solutions that deliver more computing horsepower than the general-purpose CPUs that are becoming a bottleneck for overall processing.

The DSS 8440 is a 4U 2-socket accelerator-optimized server designed to deliver exceptionally high compute performance. Its open architecture maximizes customer choice for machine learning infrastructure while also delivering best of breed technology (#1 server provider & the #1 GPU provider). It lets you tailor your machine learning infrastructure to your specific needs – without lock-in.

With a choice of 4, 8 or 10 of the industry-leading NVIDIA® Tesla® V100 Tensor Core GPUs, combined with 2 Intel CPUs for system functions, a high performance switched PCIe fabric for rapid IO and up to 10 local NVMe and SATA drives for optimized access to data, this server has both the performance and flexibility to be an ideal solution for machine learning training – as well as other compute-intensive workloads like simulation, modeling and predictive analysis in engineering and scientific environments.

The DSS 8440 and Machine Learning


Machine learning encompasses two distinctly different workloads; training and inference. While each benefits from accelerator use, they do so in different ways, and rely on different accelerator characteristics that may vary from accelerator to accelerator. The initial release of the DSS 8440 is specifically targeted at complex, training workloads. It provides more of the raw compute horsepower needed to quickly process the increasing complicated models that are being developed for complex workloads like image recognition, facial recognition and natural language translation.

At the simplest level, machine learning training involves “training” a model by iteratively running massive amounts of data through a weighted, multi-layered algorithm (thousands of times!), comparing it to a specifically targeted outcome and iteratively adjusting the model/weights to ultimately result in a “trained” model that allows for a fast and accurate way to make future predictions. Inference is the production or real-time use of that trained model to make relevant predictions based on new data.

Training workloads demand extremely high-performance compute capability. To train a model for a typical image recognition workload requires accelerators that can rapidly process multiple layers of matrices in a highly iterative way – accelerators that can scale to match the need. NVIDIA® Tesla® V100 GPUs are such an accelerator. The DSS 8440 with NVIDIA GPUs and a PCIe fabric interconnect has demonstrated scaling capability to near-equivalent performance to the industry-leading DGX-1 server (within 5%) when using the most common machine learning frameworks (i.e., TensorFlow) and popular convolutional neural network (CNN) models (i.e., image recognition).

Note that Dell EMC is also partnering with the start-up accelerator company Graphcore to achieve new levels of training performance. Graphcore is developing machine learning specific, graph-based technology to enable even higher performance for training workloads. Graphcore accelerators will be available with DSS 8440 in a future release. See the Graphcore sidebar for more details.

Inference workloads, while still requiring acceleration, do not demand as high a level of performance, because they only need one pass through the trained model to determine the result.

However, inference workloads demand the fastest possible response time, so they require accelerators that provide lower overall latency. While this release of the DSS 8440 is not targeted for inference usage, note that the accelerator card that Graphcore is developing can support training and inference. (It lowers over all latency by loading the full machine learning model into accelerator memory.)

Exceptional throughput performance


With the ability to scale up to 10 accelerators, the DSS 8440 can deliver higher performance for today’s increasingly complex computing challenges. Its low latency, switched PCIe fabric for GPU-to-GPU communication enables it to deliver near equivalent performance to competitive systems based on the more expensive SXM2 interconnect. In fact, for the most common type of training workloads, not only is the DSS 8440 throughput performance exceptional, it also provides better power efficiency (performance/watt).

Most of the competitive accelerator optimized systems in the marketplace today are 8-way systems. An obvious advantage of the DSS 8440 10 GPU scaling capability is that it can provide more raw horsepower for compute-hungry workloads. More horsepower that can be used to concentrate on increasingly complex machine learning tasks, or conversely, may be distributed across a wider range of workloads – whether machine learning or other compute-intensive tasks. This type of distributed, departmental sharing of accelerated resources is a common practice in scientific and academic environments where those resources are at a premium and typically need to be re-assigned as needed among dynamic projects.

Better performance per watt


One of the challenges faced as accelerator capacity is increased is the additional energy required to drive an increased number of accelerators. Large scale data centers understand the importance of energy savings at scale. The DSS 8440 configured with 8 GPUs has proven to be more efficient on a performance per watt basis than a similarly configured competitive SXM2-based server – up to 13.5% more efficient.

That is, when performing convolutional neural network (CNN) training for image recognition it processes more images than the competitive system, while using the same amount of energy. This testing was done using the most common machine learning frameworks – TensorFlow, PyTorch and MXNet – and in all three cases the DSS 8440 bested the competition. Over time, and at data center scale, this advantage can result in significant operational savings.

Accelerated development with NVIDIA GPU Cloud (NGC)


When the DSS 8440 is configured with NVIDIA V100 GPUs you get the best of both worlds – working with the world’s #1 server provider (Dell EMC) and the industry’s #1 provider of GPU accelerators (NVIDIA). In addition, you can take advantage of the work NVIDIA has done with NVIDIA GPU Cloud (NGC), a program that offers a registry for pre-validated, pre-optimized containers for a wide range of machine learning frameworks, including TensorFlow, PyTorch, and MXNet. Along with the performance-tuned NVIDIA AI stack these pre-integrated containers include NVIDIA® CUDA® Toolkit, NVIDIA deep learning libraries, and the top AI software. They help data scientists and researchers rapidly build, train, and deploy AI models to meet continually evolving demands

More power, more efficiency – the DSS 8440


Solve tougher challenges faster. Reduce the time it takes to train machine learning models with the scalable acceleration provided by the DSS 8440. Whether detecting patterns in online retail, diagnosing symptoms in the medical arena, or analyzing deep space data, more computing horsepower allows you to get better results sooner – improving service to customers, creating healthier patients, advancing the progress of research. And you can meet those challenges while simultaneously gaining greater energy efficiency for your data center. The DSS 8440 is the ideal machine learning solution for data centers that are scaling to meet the demands of today’s applications and want to contain the cost and inefficiencies that typically comes with scale.

Sunday, April 14, 2019

Data & AI: The Crystal Ball into Your Future Success


Years ago, the future was much opaquer. Now, it’s tangible, visible and rising up all around us.  It seems to be taking shape in real time, much of which can be attributed to innovation in data and infrastructure, across their respective and collective aspects.

As innovation in these areas accelerates, it rapidly gains in capabilities, particularly for enterprises who have reached a point of digital maturity, ensuring access to quality data and accelerated infrastructure at scale. Yet, for others, their data and analytics initiatives are still lacking. As their data continues to expand, they do not have the right building blocks in place to grow and change with it.  In fact, a recent McKinsey survey of more than 500 executives found that more than 85% acknowledged they are only somewhat effective at meeting the goals they set for their data and analytics initiatives.

With both growing and mature data sets, the effects of enterprise deep learning and machine learning can be significant – automating processes, identifying trends in historical data and uncovering valuable intelligence that strengthens fast and accurate decision-making abilities – all of which can be used as a virtual crystal ball to refine predictions about the future and potentially its successes.

To do this correctly, companies should look at using their data analytics capabilities to not only improve their core operations, but also to launch entirely new business models and applications. First, they must solve for problems in the way data is generated, collected, organized and acted upon. Because, while the mechanics are important, the ultimate value of data doesn’t come from merely collecting it, but acting on the insights derived from it.

The key lies in a fundamental mind shift of evolving your organization into a technology company with a data-first mentality.

In my experience, there are three certainties for every company:

  1. Your data is going to grow faster than you expected.
  2. The use cases for this data are going to change.
  3. The business is always going to expect outcomes to be delivered faster.

The first step in the journey to becoming a technology company is simplifying the infrastructure by moving from legacy data systems to a more nimble, flexible modernized data architecture that can bridge both structured and unstructured data to deliver deeper insights and performance at scale. Once consolidated onto a single, scalable, analytics platform, the pace of discovery and learning can be accelerated to drive a more accurate strategic vision for both today and tomorrow.

At Dell EMC, we are dedicated to bringing new and differentiated value and opportunities to our customers globally. We are always looking toward current and future trends and technologies that will help customers better manage and take advantage of their growing data sets with deep learning and machine learning at scale.

Dell EMC Isilon does just that.

As an industry leading scale-out network-attached storage, designed for demanding enterprise data sets, Isilon simplifies management and gives you access to all your data, scaling from tens of terabytes to tens of petabytes per cluster.  We also deliver all-flash performance and file concurrency up to the millions, allowing us to support the bandwidth needs of 1000’s of GPUs running the most complex neural networks available.  As a bonus, we accomplish this this very economically, with over 80% storage utilization, data compression and automated-tiering across flash and disk in a single cluster.  Finally, Isilon based AI increases operational flexibility with multiprotocol support, allowing you to bring analytics to the data to accelerate AI innovation with faster cycles of learning, higher model accuracy and improved GPU utilization.

In an era of change and ongoing data expansion, creating a crystal ball for your business is not a matter of luck or fortune telling.  It takes place through a focused strategy for doing more with the data you have at hand.  By offering innovative new ways to store, manage, protect and use data at scale, Isilon moves customers that much closer to both becoming technology companies and future proofing their businesses.