星期三, 7月 30, 2008

[Hadoop]Automatic namespace recovery from the secondary image.

太棒了,我找到secondary namenode如何recovery了~~~~
大家可以參考這個link

我節錄重點如下:

We had a real example of such failure on one of our clusters.
And we were able to reconstruct the namespace image from the secondary node using the following manual procedure, which might be useful for those who find themselves in the same type of trouble.

Manual recovery procedure from the secondary image.
1. Stop the cluster to make sure all data-nodes and *-tracker are down.
2. Select a node where you will run a new name-node, and set it up as usually for the name-node.
3. Format the new name-node.
4. cd /current
5. You will see file VERSION in there. You will need to provide namespaceID of the old cluster in it.
The old namespaceID could be obtained from one of the data-nodes
just copy it from /current/VERSION.namespaceID
6. rm /current/fsimage
7. scp :/destimage.tmp ./fsimage
8. Start the cluster. Upgrade is recommended, so that you could rollback if something goes wrong.
9. Run Hadoop fsck, and remove files with missing blocks if any.

Automatic recovery proposal.

The proposal consists has 2 parts.

1. The secondary node should store the latest check-pointed image file in compliance with the name-node storage directory structure. It is best if secondary node uses Storage class (or FSImage if code re-use makes sense here) in order to maintain the checkpoint directory.
This should provide that the checkpointed image is always ready to be read by a name-node if the directory is listed in its "dfs.name.dir" list.
2. The name-node should consider the configuration variable "fs.checkpoint.dir" as a possible location of the image available for read-only access during startup.
This means that if name-node finds all directories listed in "dfs.name.dir" unavailable or finds their images corrupted, then it should turn to the "fs.checkpoint.dir" directory and try to fetch the image from there. I think this should not be the default behavior but rather triggered by a name-node startup option, something like:

hadoop namenode -fromCheckpoint

So the name-node can start with the secondary image as long as the secondary node drive is mounted.
And the name-node will never attempt to write anything to this drive.


Added bonuses provided by this approach

* One can choose to restart failed name-node directly on the node where the secondary node ran.
This brings us a step closer to the hot standby.
* Replication of the image to NFS can be delegated to the secondary name-node if we will
support multiple entries in "fs.checkpoint.dir". This is of course if the administrator
chooses to accept outdated images in order to boost the name-node performance.

目前先用manual的方式處理,之後再看automatic recovery做的怎麼樣再說嚕。

------------------------------------------------------------------------------------
Update:
經過不斷的試驗,發現根本還沒做好,自己備份namenode的檔案還比較快,也確認可以完全回復HDFS....>_<

星期二, 7月 29, 2008

Hadoop 之 Configuration from Installation

Separating Configuration from Installation

In the example described above, the configuration files used by the Hadoop cluster all lie in the Hadoop installation. This can become cumbersome when upgrading to a new release since all custom config has to be re-created in the new installation. It is possible to separate the config from the install. To do so, select a directory to house Hadoop configuration (let's say /foo/bar/hadoop-config. Copy the hadoop-site.xml, slaves and hadoop-env.sh files to this directory. You can either set the HADOOP_CONF_DIR environment variable to refer to this directory or pass it directly to the Hadoop scripts with the --config option. In this case, the cluster start and stop commands specified in the above two sub-sections become
% $HADOOP_INSTALL/hadoop/bin/start-all.sh --config /foo/bar/hadoop-config and
% $HADOOP_INSTALL/hadoop/bin/stop-all.sh --config /foo/bar/hadoop-config.
Only the absolute path to the config directory should be passed to the scripts.

這個段落節錄於Hadoop office web site,可以點這個link連過去。

這個還蠻有用的,可以把Hadoop installation與configuration分開,記錄一下。

Hadoop的secondary namenode設定

這個link
有人回應說要設定secondary namenode需要設定hadoop-default.xml裡面的dfs.secondary.http.address
這到很有趣,設定secondary namenode到底有哪些步驟呢?
等我搞懂了,在看要不要PO出來~~~~^^

-------------------------------------------------------------------------------
Update:
是設定hadoop-site.xml中的dfs.secondary.http.address
還有fs.checkpoint.dir即可
至於fs.checkpoint.period與fs.checkpoint.size是看個人設定
但其實一點效果也沒有,自己備份還比較快,只是要寫script罷了...

星期五, 7月 25, 2008

Hadoop之我見

Fackbook 正式使用Hadoop
http://www.facebook.com/note.php?note_id=16121578919
這代表什麼意思呢?....
資料越來越多,傳統的資料統計方式已經過時了
所以才走進雲端計算....
但其實可以發現的是要夠有錢,丟1000台機器做運算,
假設一台機器1U是10萬好了,就是10000萬 = 十億,丟十億做資料統計,相信沒幾間公司做的到(yahoo就做了)。
如果機器差一點只要3萬好了,就是3000萬,我想也沒有公司願意做。
這些都不包含 maintain and programming 的工程師。

所以Google現在就要走向租computing power,不想花錢沒關係,跟我租就好了,一次根據你使用的CPU使用率,或是硬碟的使用率來收錢,屌吧。
上次去參加 Google developer day 2008 ,我就非常有這種感覺了,而且誰都可以做,不限公司,不限個人,目前最大的問題就是資料如果牽扯到保密規定呢?....
其實這個問題好解決,就客製化就好了,公司只要說好要幾台,就幫你搞定一個單獨的區域,給你登入的方式,你自己用,要用多久就收費多少,不跟別人放在同樣的DFS就好了。
不過根據Google的文化,他一定覺得賺全球人的錢,一定比只賺企業的錢來得多。

其實還有另外的一種方式,就是我在各地區建立Data center,幫你建立雲端運算(cloud computing),我收maintain的錢,你只要用多久付多久,只是這種方式得看該地區是不是真的需求。

以台灣來看,根本就沒有需求。以電信業者而言,電信業的資料超大也超多,可是沒用,因為沒有revenue,雖然可以根據行為分析去做行銷,但是誰也不敢保證做了,賺的revenue足夠付成本還有剩,其實根據某電信只要多算五秒鐘就幾億的謠言,我相信其實是可期的,賺錢的方向越來越不像之前,以手機為走向了。以前只要 Sony Ericsson 或是 Nokia 有出新手機,電信業者有搭配門號就可以了,可是現在手機業者也不能這樣獲利了,也開始走加值功能,所以HTC, iphone, Gphone都是朝這個方向走了。

這樣代表什麼意思呢?就是手機業成熟了,不再是打打電話,而是更多的服務更多的功能,但這些都需要結合content delivery,所以手機業的戰場變成他能 deliver 多少的服務或功能,而電信業者不再是靠這手機增加門號,門號已經飽和了,重要的是通話秒數,資料傳遞量。

哈,忽然講一堆,真是怪怪的,其實只是要表達的是戰場改變,生活習慣也開始改變,都是因為雲端運算(cloud computing)越來越貼近人群,也因為資料量的龐大,開始不一樣的改變,以工程師的角度 programming 的方式開始不一樣,機器的 maintain也開始不一樣,以產業經營者的角度來看,獲利方式也開是不一樣。

當然這一切還需要時間,多久,我也不知道,但我想以Google這麼強大的影響力....
應該在不久的將來。

其實我也在想另外的一個議題,全球暖化跟太多台電腦有關係嗎??....

星期三, 7月 23, 2008

Linux grep tab的用法

寫一下以免又忘了
如果data是
wwwwww

grep [[:blank:]]w[[:blank:]]
才會抓到w
記得要雙括號!!
下面還有一些使用方式

[:alnum:] 字母數字符號
[:alpha:] 字母
[:ascii:] ASCII符號
[:blank:] 空格或是Tab
[:cntrl:] ASCII碼控制符號
[:digit:] 數字
[:graph:] 非控制,非空格符號
[:lower:] 小寫字母
[:print:] 可打印符號
[:punct:] 邊點符號
[:space:] 空白符號,包括垂直Tab
[:upper:] 大寫符號
[:xdigit:] 十六進制數字

^ 在一行的開始
$ 在一行的結尾
. 任意一個單一字元

星期四, 7月 17, 2008

Hadoop-How many reduces?

How Many Reduces?

The right number of reduces seems to be 0.95 or 1.75 multiplied by ( * mapred.tasktracker.reduce.tasks.maximum).

With 0.95 all of the reduces can launch immediately and start transfering map outputs as the maps finish. With 1.75 the faster nodes will finish their first round of reduces and launch a second wave of reduces doing a much better job of load balancing.

Increasing the number of reduces increases the framework overhead, but increases load balancing and lowers the cost of failures.

The scaling factors above are slightly less than whole numbers to reserve a few reduce slots in the framework for speculative-tasks and failed tasks.

看不是很懂,猜一下好了,是指reducer跟data node與maximum reduce task有正面關係....
所以越多會增加framework的overhead,但可以增加load balancing和lowers the cost of failures.

星期三, 7月 16, 2008

Hadoop map 設定

How Many Maps?

The number of maps is usually driven by the total size of the inputs, that is, the total number of blocks of the input files.

The right level of parallelism for maps seems to be around 10-100 maps per-node, although it has been set up to 300 maps for very cpu-light map tasks. Task setup takes awhile, so it is best if the maps take at least a minute to execute.

Thus, if you expect 10TB of input data and have a blocksize of 128MB, you'll end up with 82,000 maps, unless setNumMapTasks(int) (which only provides a hint to the framework) is used to set it even higher.

所以請將你的total data size / block size = map number,下台一鞠躬

Hadoop system

最近上Hadoop的官網看到這段話

The client framework will implicitly associate the user identity with the connection to the name node, reducing the need for changes to the existing client API. It has always been the case that when one operation on a file succeeds, the operation might fail when repeated because the file, or some directory on the path, no longer exists. For instance, when the client first begins reading a file, it makes a first request to the name node to discover the location of the first blocks of the file. A second request made to find additional blocks may fail. On the other hand, deleting a file does not revoke access by a client that already knows the blocks of the file.

我寫中文比較能表示意思,表示當有人reading a file,同時有人做刪除或修改的動作,reading會錯誤。
表示說Hadoop並未做任何lock的動作....所以如果有read file from HDFS,記得要做Error handle。

星期四, 7月 10, 2008

Sloan Leadership Model (史隆領導模式)

一個領導者身上包含著五個職能,即
形成願景、分析現狀、協調利益、嘗試創新、激勵賦能

有興趣的人可以點此link
期望自己也能達到這五個職能

個人目前有分析現狀與嘗試創新的能力,個人自認為啦。
每當工作賦予的時候,我會先了解現在的情況是什麼,至少不要浪費時間做別人做過的事
接下來會在目前的情況中找出適合的處理方式,並且思考邏輯是否合理,而不是照著別人的步伐走
形成願景其實我也會思考如果我是老闆,我應該朝哪個方向走,但這件事最好別常常做,不然發現目前走的方向跟你想的不太一樣的時候,會有點痛苦。
激勵賦能是我缺乏的,因為目前並沒有帶過人,不過至少與team member合作上,總是能在協同合作上達到最好的效率,這就需要判斷誰比較適合做什麼事了。

星期三, 7月 09, 2008

用Perl做URL Encode & Decode

Encode
$str =~ s/([^A-Za-z0-9])/sprintf("%%%02X", ord($1))/seg;
例如
testpq1-1-abc.com/80/test/test.htm?redirect.asp%uhttp://abc.com
Encode後就是
testpq1%2D1%2Dabc%2Ecom%2F80%2Ftest%2Ftest%2Ehtm%3Fredirect%2Easp%25uhttp%3A%2F%2Fabc%2Ecom

Decode
$str =~ s/\%([A-Fa-f0-9]{2})/pack('C', hex($1))/seg;

參考一下吧~~~^^