搜索资源列表
text_category
- 中文自动分类。使用spider抓取网络信息,利用lucene的分词及KNN方法。-Chinese automatic classification. The use of spider crawl network information, the use of Lucene sub-word and KNN methods.
ROSTDM
- 网页文本抓取,通过设置XML可以批量抓取任意网站的任意数据-Web text crawl, crawl any website any data volume by setting XML