|
QQ wants to do search engine?
1. I only know 粗鲁法
2. For approximate solution, use streaming algorithm like sticky sampling or lossy counting, 1 pass and O(logn) (n=10,000,000 here) space consumption. Use 128bit SHA1 hashing function set (using a hash function set to avoid collision) is the other solution I can come up with, still 1 pass, defficiency is using 32bit * 10million ~ 300MB space for maintaining counter space, not really bad for memory in nowadays.
For exact solution, I heard (not 100% sure) google using 'tries' (this is tree-like data structure) to index huge text record.
3. Typical application for dynamic programming.
These three are a little bit too difficult for undergraduate students. I do not think the testers can come up with a good answer without peeping the solutions on a research paper.
[ 本帖最后由 小飞侠 于 2009-1-4 12:37 编辑 ] |
|