Mastering Apache Spark

Updated a month ago

Srinivas Reddy (@mrsrinivas) started discussion #96

2 years ago · 0 comments


Spark relies on data locality, aka data placement or proximity to data source, that makes Spark jobs sensitive to where the data is located. It is therefore important to have Spark running on Hadoop YARN cluster if the data comes from HDFS.

Data Locality (Edit this file)

Won't Spark standalone will schedule tasks in NODE_LOCAL way ?

No description provided.

No comments on this discussion.

to join this conversation on GitBook. Already have an account? Sign in to comment

You’re not receiving notifications from this thread.

1 participant