Summary of Join in MapReduce

March 9th, 2012 No comments

>> copy from research blog.

MapReduce can perform joins between large datasets, but writing the code to do joins from scratch is fairly involved. The basic problem is reconcile two datasets shared a same field/key.

Reduce-Side Joins
A reduce join operation will be compiled to a MapReduce task, which involves a map stage and a reduce stage. A mapper reads from join tables and emits the join key and join value pair into an intermediate file. Hadoop sorts and merges these pairs in what’s called the shuffle stage. The reducer takes the sorted results as input and does the actual join work. The shuffle is really expensive since it needs to sort and merge. Saving the shuffle and reduce stages improves the task performance.

Map-Side Joins
The motivation of map join is to save the shuffle and reduce stages and do the join work only in the map stage. By doing so, when one of the join tables is small enough (|R|<|S|/r) to fit into the memory, all the mappers can hold the small data in memory and do the join work there. So all the join operations can be finished in the map stage.

However there are some scaling problems with this type of map join. When thousands of mappers read the small join table from the HDFS into memory at the same time, the join table easily becomes the performance bottleneck, causing the mappers to time out during the read operations.

To support the data distribution of small table, MapReduce can put the small table into the distributed cache before launching all mappers. And the distributed cache will distribute the small table to cache located at each node. After optimization, the small table needs to be read just once.

Semi Joins
The reduce-side join may involve lots of transfer cost, as the two tables are all needed to ship to reducers. In some scenarios, such as log processing, a large portion of records in two records don’t share a same key, which means these records are not used by the join operation. Semi-join is used to avoid sending records over the network that will not be used in join operation.

Let X, Y represent the two tables, and X is smaller than Y. The semi join implementation has two phases. The first phase is to collect key information in table X and save it in another file. In the second phase, the generate file is distributed to all mappers through distributed cache. Each record in table X or Y will be output to reducer only its key appear in the key file. Through such way, we can avoid sending unused records

If the key file is too large to be load in memory, we can further use bloom filter.

Categories: Cloud Computing Tags:

哪都一样

April 16th, 2011 No comments

先看看这则新闻:

据“中广新闻网”15日报道,英国威廉王子婚期逼近,每一个朋友都想在他结束“光棍”生涯前,和他好好聚聚。不过,他上周日在威尔士一家餐厅宴请过去军中好友的聚餐会,却订不到位子,被迫转移阵地。
据报道,威廉原本订好了餐厅,却因为临时要增加二十多个客人,来吃饭的人队伍大增。据悉,威廉订餐的餐厅正好有一名厨师生病,没有能力配合,业者只得求助另一家餐厅。但对方听到是威廉王子要订位,还以为是恶作剧电话,打了三、四次电话,对方才相信。

当地的一名商人说,威廉最初订的那家餐厅本来就“一位难求”,每位上门的客人都得事先订位,这很公平。不过,一位美国游客对此表示惊讶。他说,如果是在美国,奥巴马想到哪家餐厅用餐,业者多半会替他清场

国内媒体喜欢去报道美国奥巴马们的日常生活,去体现没有特权、亲民,以及民众不care他们的态度。其实哪都一样,餐馆照样会给清场的。
另外一则:

萨科齐夫妇将前往布鲁尼位于法国里维耶拉的家族别墅(如图)度过复活节假期,为“保护总统夫妇的隐私”,法政府宣布该别墅上空为低空禁飞区,闯入者将被处以4万欧元罚款或6个月刑拘。
Categories: Others Tags:

乱七八糟

April 11th, 2011 2 comments

theme换来换去还是换回这个,上一个显示英文时会一团糟。

至少目前空间不用换了,就一直用ipage吧。ipage的一个好处就是unlimited storage,不过一般人也用不了多少。所以用这个来招揽人还是很不错的。

ipage一个搞笑的地方是 你可以在它网站上自己找到各种折扣,连点了一下cancel 它都会说 别走,再多给你打点折。

Categories: Others Tags:

Computer Science Conference Rankings

February 19th, 2011 No comments

AREA: Artificial Intelligence and Related Subjects

>>Rank 1
AAAI: American Association for AI National Conference
CVPR: IEEE Conf on Comp Vision and Pattern Recognition
IJCAI: Intl Joint Conf on AI
ICCV: Intl Conf on Computer Vision
ICML: Intl Conf on Machine Learning
KDD: Knowledge Discovery and Data Mining
KR:  Intl Conf on Principles of KR & Reasoning
NIPS: Neural Information Processing Systems
UAI: Conference on Uncertainty in AI
ICAA: International Conference on Autonomous Agents
ACL: Annual Meeting of the ACL (Association of Computational Linguistics) Read more…

Categories: Others Tags:

Computer Security Conference Ranking

February 19th, 2011 No comments

Usenix Security Symposium, NDSS, RAID are all tier 2 conferences in whole CS Conference Ranking.

In Security Conference Ranking, Usenix Security Symposium and NDSS are tier 1.

In CS Conference Ranking, only CCS (ACM Conf on Comp and Communications Security) and S&P (IEEE Symposium on Security and Privacy) are tier 1.

The following ranking information is fetched from Guofei Gu’s page.

>>Rank 1

S&P: IEEE Symposium on Security and Privacy
CCS: ACM Conference on Computer and Communications Security
Crypto: International Cryptology Conference
Eurocrypt: European Cryptology Conference
Security: Usenix Security Symposium
NDSS:  ISOC Network and Distributed System Security Symposium Read more…

Startup Idea Generator

February 11th, 2011 No comments

A idea generator, http://www.ykombinator.com

some interesting.

Categories: Others Tags:

金山AVC的评测结果

January 19th, 2011 No comments

最近在Donews上看到金山自己讲自己杀毒效果的文章,还真是不知廉耻呢。http://www.donews.com/original/201101/340839.shtm

从AVC的官方网站上找来每三个月一次的测试结果:http://www.av-comparatives.org/en/comparativesreviews

这里以其中最基本的Detection Tests为例,http://www.av-comparatives.org/en/comparativesreviews/detection-test

2010年11月的测试报告没有金山,(金山自己辩解说自己不想参加,网上流传的是金山被踢出去了)。

2010年8月:

Read more…

Categories: Others Tags: , , ,

Flash/AIR only supported on ARMv7+/Android 2.2

January 12th, 2011 No comments

Try lots of methods to run our swf application on HTC Dream(G1), but no success., even I successfully install Adobe AIR 2.5 using “adb install ***’. Though applications based on AIR can be installed in HTC Dream (G1), they can not run well.

Here’s Adobe’s minimum requirements to run Flash on Android devices:

VGA WVGA
Hardware requirements Dedicated Cortex A8 (ARMv7) 550MHz App Processor with Neon for A8 only

Hardware Vector FPU

Dedicated Cortex A8 (ARMv7) 800MHz App Processor

Hardware Vector FPU

Operating system Android™ 2.2

http://www.adobe.com…dex.html#mobile

List of officially supported HTC Smartphones:
http://www.adobe.com…phones.html#htc

Some other requirements:

  • OpenGL ES2.0
  • H.264 & AAC H/W Decoders
  • 256 MB of RAM

Adobe should post a remarkable notification in AIR’s official site. Otherwise, lots of developers waste a lot of time on this issue. Stupid Adobe!!!!!!

Categories: Coding Tags: , , ,

.swf packaged to .apk and install in Android Device

January 5th, 2011 No comments

for emulator

adt -package -target apk-emulator -storetype pkcs12 -keystore android.p12 Test.apk Test-app.xml Test.swf

for physical device

adt -package -target apk -storetype pkcs12 -keystore android.p12 Test.apk Test-app.xml Test.swf

如果apk无法安装,尝试一下-target apk-debug.  TNND,这个问题折腾我好几天。。。。

ps:

MOBILE SYSTEM REQUIREMENTS

  • Android devices
    • Google Android™ 2.2 operating system
    • ARMv7-A processor with vector FPU
    • OpenGL ES 2
    • H.264 and AAC hardware decoders
    • 256MB of RAM
  • BlackBerry™ Tablet OS
  • iOS 3 and higher

ps2: Google dev phone 1 is 1.5 OS.

Upgrade to 2.2 OS:

http://developer.htc.com/adp.html

http://wiki.cyanogenmod.com/index.php?title=Full_Update_Guide_-_Android_Dev_Phone_1

How to install APK Application Packages on Your Android Device

To install the mobile applications that are provided as APK files below, you must have a device running Android 2.2 (“Froyo”). To install an application from an APK file:

  1. Go to the Settings application on your device, choose Applications, and ensure “Unknown sources” is checked.
  2. Visit this page in your device’s web browser (http://adobe.com/go/mobilesamples). Note: This page is easier to read if you turn your phone to landscape mode and double-tap on the main text.
  3. Tap on an APK link to download it.
  4. Once it’s downloaded, pull down the notification area from the top of your device’s screen and tap on the downloaded APK to start the installation process.
  5. If you don’t yet have the AIR runtime installed on your device, you’ll be prompted to download it from the Android Market the first time you run an AIR application. After installing the runtime, run the AIR application again.
Categories: Coding Tags: , , , ,

手动打包swf成ipa

December 29th, 2010 No comments

记一下,省得下次又忘记了。
由于Flash Builder Burrito的正式版本一直没有出来(传闻2010年底的,可这都29号了。。。),而prelease只支持Android,所以最后选择手工打包swf成ipa文件。
其实这里最需要注意的就是:不要使用Burrito自己生成的swf文件,而是命令行自己生成swf

首先在Burrito创建一个Flex Project (没测试过其它类型的project),然后创建swf:

>Adobe Flash Builder Burrito\sdks\4.1.0\bin\amxmlc “Test.mxml”

用amxmlc生成swf。

然后就是用pfi命令打包成ipa文件:

>Adobe Flash CS 5\PFI\bin\pfi -package -target ipa-test -provisioning-profile Test.mobileprovision -storetype pkcs12 -keystore ***.p12 -storepass **** Test.ipa Test-app.xml Test.swf

如果用到swc以及其它的一些actionscript的sourcecode的话,那么在生成swf的时候需要一些参数,例如下面的例子:

amxmlc -sp “C:\Documents and Settings\Bruno\Adobe Flash Builder 4\myapp\src” -el “C:\Program Files\Adobe\Adobe Flash Builder 4\sdks\4.1.0\frameworks\libs\air\airglobal.swc” -o “C:\Documents and Settings\Bruno\Adobe Flash Builder 4\myapp\myapp.swf” “C:\Documents and Settings\Bruno\Adobe Flash Builder 4\myapp\src\myapp.mxml”

ref:http://forums.adobe.com/thread/702429
http://forums.adobe.com/thread/731171