Faculty of Industrial Engineering and Management
Information Retrieval, (096262)
Have you ever wanted to learn how search engines like Google might find relevant pages in response to your query? Or, how Clusty (clusty.com) knows how to group your search results into topics? And how you can automatically filter spam email without writing any rules for detecting it?
This course will introduce you to the core topics in the field of information retrieval: search, classification and clustering. We will cover the underlying mathematical models and experimental results of different approaches employed in this area. The course is intended for senior undergraduate and graduate students.
Courses in: Probability (094411, or 094412), Linear Algebra (104009, or 104167), Software Engineering (094219) or equivalent.
Prof. Oren Kurland, firstname.lastname@example.org, http://iew3.technion.ac.il/~kurland
Office hours: TBD
Christopher D. Manning, Prabhakar Raghavan and Hinrich Sch¨utze, “Introduction to Information Retrieval,” Cambridge University Press, 2008. (Available online at: http://www-nlp.stanford.edu/IR-book/)
• R.Baeza-Yates and B.Ribeiro-Neto, “Modern Information Retrieval,” AddisonWesley, (Second Edition), 2011.
• W. Bruce Croft, Donald Metzler, and Trevor Strohman. Search Engines:
“Information Retrieval in Practice,” 2009.
• David A. Grossman and Ophir Frieder, “Information Retrieval: Algorithms and Heuristics,” (Second Edition), Springer, 2006.
• I.H.Witten, A.Moﬀat and T.C.Bell, “Managing Gigabytes,” Morgan Kaufmann, 1999.
• W.B. Croft and J. Laﬀerty, “Language Modeling for Information Retrieval,” Springer, 2003.
• W.B. Croft, “Advances in Information Retrieval, Recent Research from the Center for Intelligent Information Retrieval,” Kluwer, 2000.
(1) Understanding fundamental models and methods in the information retrieval field, specifically, in the domains of search, classification, and clustering.
(2) Applying methods learned in class so as to develop information retrieval applications as well as novel information retrieval approaches.
Course Topics (tentative)
1. The architecture of a search engine.
2. Properties of text (e.g., Zipf’s law), indices, compression.
3. Performance evaluation of search engines.
4. Relevance ranking models (boolean, vector space, probabilistic, statistical language models).
5. Relevance feedback (Rocchio, statistical relevance models).
6. Web retrieval (link analysis).
7. Classification (discriminative vs. generative models, Na¨ıve Bayes, Rocchio, nearest neighbors, support vector machines).
8. Clustering (flat, hierarchical, dimension reduction (e.g., LSI)).
Course Expectations & Grading
· Final exam: 80–90 points
· Homework assignments: 10–20 points
· High quality performance in a search engine competition that will be announced in class will result in extra credit (added to the final course grade). Participation in the competition is not obligatory.
Please note that students cannot pass the course without passing the final exam.
The strength of the university depends on academic and personal integrity. In this course, you must be honest and truthful. Ethical violations include cheating on exams, plagiarism, reuse of assignments, improper use of the Internet and electronic devices, unauthorized collaboration, alteration of graded assignments, forgery and falsification, lying, facilitating academic dishonesty, and unfair competition.