Tuesday, February 19, 2013

Mastering Apache Spark, by Mike Frampton

Mastering Apache Spark, by Mike Frampton

Checking out Mastering Apache Spark, By Mike Frampton is a really beneficial passion and also doing that can be undergone at any time. It implies that reviewing a book will not limit your activity, will certainly not require the time to invest over, as well as won't spend much money. It is a very budget-friendly as well as reachable point to purchase Mastering Apache Spark, By Mike Frampton But, with that said extremely economical thing, you could obtain something new, Mastering Apache Spark, By Mike Frampton something that you never ever do as well as enter your life.

Mastering Apache Spark, by Mike Frampton

Mastering Apache Spark, by Mike Frampton



Mastering Apache Spark, by Mike Frampton

Read and Download Mastering Apache Spark, by Mike Frampton

Gain expertise in processing and storing data by using advanced techniques with Apache Spark

About This Book

  • Explore the integration of Apache Spark with third party applications such as H20, Databricks and Titan
  • Evaluate how Cassandra and Hbase can be used for storage
  • An advanced guide with a combination of instructions and practical examples to extend the most up-to date Spark functionalities

Who This Book Is For

If you are a developer with some experience with Spark and want to strengthen your knowledge of how to get around in the world of Spark, then this book is ideal for you. Basic knowledge of Linux, Hadoop and Spark is assumed. Reasonable knowledge of Scala is expected.

What You Will Learn

  • Extend the tools available for processing and storage
  • Examine clustering and classification using MLlib
  • Discover Spark stream processing via Flume, HDFS
  • Create a schema in Spark SQL, and learn how a Spark schema can be populated with data
  • Study Spark based graph processing using Spark GraphX
  • Combine Spark with H20 and deep learning and learn why it is useful
  • Evaluate how graph storage works with Apache Spark, Titan, HBase and Cassandra
  • Use Apache Spark in the cloud with Databricks and AWS

In Detail

Apache Spark is an in-memory cluster based parallel processing system that provides a wide range of functionality like graph processing, machine learning, stream processing and SQL. It operates at unprecedented speeds, is easy to use and offers a rich set of data transformations.

This book aims to take your limited knowledge of Spark to the next level by teaching you how to expand Spark functionality. The book commences with an overview of the Spark eco-system. You will learn how to use MLlib to create a fully working neural net for handwriting recognition. You will then discover how stream processing can be tuned for optimal performance and to ensure parallel processing. The book extends to show how to incorporate H20 for machine learning, Titan for graph based storage, Databricks for cloud-based Spark. Intermediate Scala based code examples are provided for Apache Spark module processing in a CentOS Linux and Databricks cloud environment.

Style and approach

This book is an extensive guide to Apache Spark modules and tools and shows how Spark's functionality can be extended for real-time processing and storage with worked examples.

Mastering Apache Spark, by Mike Frampton

  • Amazon Sales Rank: #610501 in eBooks
  • Published on: 2015-09-30
  • Released on: 2015-09-30
  • Format: Kindle eBook
Mastering Apache Spark, by Mike Frampton

About the Author Mike Frampton Mike Frampton is an IT contractor, blogger, and IT author with a keen interest in new technology and big data. He has worked in the IT industry since 1990 in a range of roles (tester, developer, support, and author). He has also worked in many other sectors (energy, banking, telecoms, and insurance). He now lives by the beach in Paraparaumu, New Zealand, with his wife and teenage son. Being married to a Thai national, he divides his time between Paraparaumu and their house in Roi Et, Thailand, between writing and IT consulting. He is always keen to hear about new ideas and technologies in the areas of big data, AI, IT and hardware, so look him up on LinkedIn (http://linkedin.com/profile/viewid=73219349) or his website (http://www.semtech-solutions.co.nz/#!/pageHome) to ask questions or just to say hi.


Mastering Apache Spark, by Mike Frampton

Where to Download Mastering Apache Spark, by Mike Frampton

Most helpful customer reviews

1 of 1 people found the following review helpful. bleeding edge Spark By Ian Stirk Hi,I have written a detailed chapter-by-chapter review of this book on www DOT i-programmer DOT info, the first and last parts of this review are given here. For my review of all chapters, search i-programmer DOT info for STIRK together with the book's title.This book aims to provide a practical discussion of Spark and its major components. How does it fare?Spark is an increasingly popular Big Data technology, generally performing processing much faster than traditional MapReduce jobs.This book is for anyone who wants to know more about Spark. In particular, the basic Spark components are discussed, and then Spark is extended with some of the more experimental components.The book assumes a basic knowledge of Linux, Hadoop, Spark, SBT, and a reasonable knowledge of Scala. The author suggests using the internet to fill any gaps in your prerequisites knowledge.Below is a chapter-by-chapter exploration of the topics covered.Chapter 1 Apache SparkThe chapter opens with an overview of Spark, being a distributed, scalable, in-memory, parallel processing data analytics system. Spark can be programmed in various languages, including: Java, Python, and Scala. The examples in this book use Scala.The chapter discusses in outline, the 4 major Spark components (i.e. Machine Learning, Streaming, SQL, and Graph processing), cloud integration, and the future of Spark. Cluster design is briefly examined, it’s noted that Spark doesn’t have its own storage system, so Hadoop is often used – this has the advantage that Spark can become another component in the Hadoop toolset.The chapter continues with a look at cluster management, and configuring the Spark cluster. Useful discussions and diagrams explaining the Spark master, worker nodes, client nodes and Spark context are provided. This is followed by a section that examines cluster management running as: local, standalone, using YARN, using Mesos, and using Amazon’s Elastic Compute Cloud (EC2).Next, performance is briefly examined. Topics include: cluster structure (cloud or shared boxes are often slower), putting applications on their own separate nodes, allocate sufficient memory, and filtering data early in the ETL process.The chapter ends with a look at the cloud, it’s suggested this is the future direction of technology, with Spark as a service. Various providers are briefly discussed (e.g. Databricks, and Google cloud).This chapter provides a helpful overview of what Spark is, its major components, its various cluster managers, Spark architecture, and its future. Subsequent chapters expand on the major Spark components, and discuss its promising future in the cloud.Useful discussions, diagrams, configuration settings, practical example code, website links, inter-chapter links are given throughout. These traits apply to the whole of the book....ConclusionThis book has well-written discussions, helpful examples, diagrams, website links, inter-chapter links, and useful chapter summaries. It contains plenty of step-by-step code walkthroughs, to help you understand the subject matter.The book describes Spark’s major components (i.e. Machine Learning, Streaming, SQL, and Graph processing), each with practical code examples. Some of the template code could form the basis of your own application code.Several of the core Spark components are extended using less well-know components, many of these are still works in progress. I’m not sure how many readers will find these chapters/sections useful, since they often involve workarounds, or the components might not exist or be superseded later – they can also distract from the book’s core. That said, if you enjoy working at the bleeding edge of technology, you’ll enjoy what these extensions add.Although the book assumes some knowledge of Spark, for completeness, it might have been useful to have some introduction to it (e.g. explain RDDs, introduce the spark-shell etc). Developers coming from a Windows environment might struggle initially understanding Linux, SBT, JARs etc.Despite these concerns, I enjoyed this book, it contains plenty of useful detail. Spark is a rapidly changing technology, so check the spark website for the latest changes. The book is highly recommended.

1 of 1 people found the following review helpful. Using Spark with other big data technologies By Antony Arokiasamy The book provides a super fast, short introduction to Spark in the first chapter and then jump straight into MLlib, Spark Streaming Spark SQL, GraphX, etc. in subsequent chapters.A huge positive for this book is that it not only talks about Spark itself, but also covers using Spark with other big data technologies like Hadoop, Kafka, Titan, Neo4j, HBase, Cassandra, H2O, etc. More on this below.True to the name, sure the book covers more than simple introductory Spark topics, but it concentrates on breath than depth. There is decent coverage and enough code examples for each topic, but what it lacks is depth. There is no "best practices" or "performance" or "watch out for" type discussions or any type of advanced code.The MLlib chapter covers Naive Bayes, K-Means and Artificial Neural Networks (ANN). For each algorithm, the theory is very briefly introduced and then jumps right into detailed code walkthroughs.The Spark Streaming chapter introduces Streaming briefly and jumps straight into using different streaming sources and code walkthroughs of how to use them. This chapter covers TCP streams, File streams, Flume and Kafka sources.By now the pattern of the chapters should be evident. The next chapter on Spark SQL follows the same format covering different data source like, Text, Json, Parquet, Hive and covers DataFrame/SQL code examples.GraphX is covered in the next two chapters. Integration of GraphX with Neo4j and Titan (both HBase and Cassandra backed store) is covered extensively.Finally H2O integration and the Databricks Spark hosted offering is discussed.I would definitely recommend this as the second Spark book after any Introductory Spark book (or Spark Documentation).

0 of 0 people found the following review helpful. ... books on Spark and this is one of the best ones I've read By Brett Palmer I have several books on Spark and this is one of the best ones I've read. The book provides a good balance of introduction and advanced features to help you implement a Spark solution in your environment. The chapters are well written and the source code can be downloaded from Packt. The book also introduces other open source and commercial products that can be used with Spark to provide solutions for your own big data project.Here are some of the chapters I found particularly helpful:- Apache Spark MLlib - Apache's machine learning library that comes with Spark.- Apache Spark Streaming- Apache Spark GraphX - also includes chapters on storage of graph objects- Extending Spark with H20- Spark Databricks - a commercial product that makes it easier to create an analytics cluster in the cloud with SparkThe kindle version is formatted well and easy to read. You can jump to specific chapters or read the entire book from start to finish. Good luck in your Big Data endeavors.

See all 8 customer reviews... Mastering Apache Spark, by Mike Frampton


Mastering Apache Spark, by Mike Frampton PDF
Mastering Apache Spark, by Mike Frampton iBooks
Mastering Apache Spark, by Mike Frampton ePub
Mastering Apache Spark, by Mike Frampton rtf
Mastering Apache Spark, by Mike Frampton AZW
Mastering Apache Spark, by Mike Frampton Kindle

Mastering Apache Spark, by Mike Frampton

Mastering Apache Spark, by Mike Frampton

Mastering Apache Spark, by Mike Frampton
Mastering Apache Spark, by Mike Frampton

No comments:

Post a Comment