星期四, 7月 17, 2008

Hadoop-How many reduces?

How Many Reduces?

The right number of reduces seems to be 0.95 or 1.75 multiplied by ( * mapred.tasktracker.reduce.tasks.maximum).

With 0.95 all of the reduces can launch immediately and start transfering map outputs as the maps finish. With 1.75 the faster nodes will finish their first round of reduces and launch a second wave of reduces doing a much better job of load balancing.

Increasing the number of reduces increases the framework overhead, but increases load balancing and lowers the cost of failures.

The scaling factors above are slightly less than whole numbers to reserve a few reduce slots in the framework for speculative-tasks and failed tasks.

看不是很懂,猜一下好了,是指reducer跟data node與maximum reduce task有正面關係....
所以越多會增加framework的overhead,但可以增加load balancing和lowers the cost of failures.

沒有留言:

張貼留言