星期三, 7月 30, 2008

[Hadoop]Automatic namespace recovery from the secondary image.

太棒了,我找到secondary namenode如何recovery了~~~~
大家可以參考這個link

我節錄重點如下:

We had a real example of such failure on one of our clusters.
And we were able to reconstruct the namespace image from the secondary node using the following manual procedure, which might be useful for those who find themselves in the same type of trouble.

Manual recovery procedure from the secondary image.
1. Stop the cluster to make sure all data-nodes and *-tracker are down.
2. Select a node where you will run a new name-node, and set it up as usually for the name-node.
3. Format the new name-node.
4. cd /current
5. You will see file VERSION in there. You will need to provide namespaceID of the old cluster in it.
The old namespaceID could be obtained from one of the data-nodes
just copy it from /current/VERSION.namespaceID
6. rm /current/fsimage
7. scp :/destimage.tmp ./fsimage
8. Start the cluster. Upgrade is recommended, so that you could rollback if something goes wrong.
9. Run Hadoop fsck, and remove files with missing blocks if any.

Automatic recovery proposal.

The proposal consists has 2 parts.

1. The secondary node should store the latest check-pointed image file in compliance with the name-node storage directory structure. It is best if secondary node uses Storage class (or FSImage if code re-use makes sense here) in order to maintain the checkpoint directory.
This should provide that the checkpointed image is always ready to be read by a name-node if the directory is listed in its "dfs.name.dir" list.
2. The name-node should consider the configuration variable "fs.checkpoint.dir" as a possible location of the image available for read-only access during startup.
This means that if name-node finds all directories listed in "dfs.name.dir" unavailable or finds their images corrupted, then it should turn to the "fs.checkpoint.dir" directory and try to fetch the image from there. I think this should not be the default behavior but rather triggered by a name-node startup option, something like:

hadoop namenode -fromCheckpoint

So the name-node can start with the secondary image as long as the secondary node drive is mounted.
And the name-node will never attempt to write anything to this drive.


Added bonuses provided by this approach

* One can choose to restart failed name-node directly on the node where the secondary node ran.
This brings us a step closer to the hot standby.
* Replication of the image to NFS can be delegated to the secondary name-node if we will
support multiple entries in "fs.checkpoint.dir". This is of course if the administrator
chooses to accept outdated images in order to boost the name-node performance.

目前先用manual的方式處理,之後再看automatic recovery做的怎麼樣再說嚕。

------------------------------------------------------------------------------------
Update:
經過不斷的試驗,發現根本還沒做好,自己備份namenode的檔案還比較快,也確認可以完全回復HDFS....>_<

沒有留言:

張貼留言