Web content mining is differentiated from two points of view: Information Retrieval View and Database View. Summarized the research works done for unstructured data and semi-structured data from information retrieval view. It shows that most of the researches use different number of words, which is based on the statistics about single words in isolation, to represent unstructured text and take one word found in the training corpus as features. For the semi-structured data, all the works utilize the HTML structures inside the documents and some utilized the hyperlink structure between the documents for document representation. As for the database view, in order to have the better information management and apply query on the web, the mining always tries to inform the structure of the web site to transform a web site to become a database.
Content Mining, database, HTML, query.