Mastering Apache Spark

Updated a month ago

Srinivas Reddy (@mrsrinivas) started discussion #95

2 years ago · 0 comments


Spark relies on data locality, aka data placement or proximity to data source, that makes Spark jobs sensitive to where the data is located. It is therefore important to have Spark running on Hadoop YARN cluster if the data comes from HDFS.

Data Locality (Edit this file)

Won't Spark standalone will schedule tasks in NODE_LOCAL way ?

No description provided.
Srinivas Reddy @mrsrinivas commented 2 years ago

Do we really have to go for spark on yarn for it ?

to join this conversation on GitBook. Already have an account? Sign in to comment

You’re not receiving notifications from this thread.

2 participants