Mastering Apache Spark

Updated a month ago

ryan-factual (@ryan-factual) started discussion #145

a year ago · 0 comments


Number of partitions can be fewer than number of blocks?

[spark programming guide](http://spark.apache.org/docs/latest/programming-guide.html#external-datasets) says:

Note that you cannot have fewer partitions than blocks.

But [Partitions and Partitioning](https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-rdd-partitions.html) says:

Ideally, you would get the same number of blocks as you see in HDFS, but if the lines in your file are too long (longer than the block size), there will be fewer partitions.

which is in conflict.

No comments on this discussion.

to join this conversation on GitBook. Already have an account? Sign in to comment

You’re not receiving notifications from this thread.

1 participant