|
1
|
- Hang Cui
- Tianjin University, China
- Ji-Rong Wen
- Microsoft Research Asia
- Jian-Yun Nie
- University of Montreal
- Wei-Ying Ma
- Microsoft Research Asia
|
|
2
|
- Motivations
- Central ideas
- Probabilistic correlations and Query expansion
- Evaluations
- Conclusions
|
|
3
|
- The word mismatching problem of web searching
- Inconsistency of term usages between user queries and documents
- The Web is not well-organized
- Users express queries with their own vocabularies
- Very short queries (less than two words)
- Simple (key)word matching doesn’t work well
|
|
4
|
- Big gap:
- 73.68 degree on average (Cosine measure = 0.28)
|
|
5
|
- Motivations
- Central ideas
- Probabilistic correlations and Query expansion
- Evaluations
- Conclusions
|
|
6
|
- Query log – a bridge to connect queries and documents
- Query session := <query text> [clicked documents]
- Log-based query expansion.
- Probabilistic correlations between query terms and document terms
- The correlations are then used to select high quality expansion terms
for new queries
|
|
7
|
|
|
8
|
- Assumption
- In a query session, the clicked documents are relevant to the given
query.
- Reasonable because:
- Users do not click documents randomly.
- Stable from a statistical view
- Our previous work on query clustering also support this assumption.
|
|
9
|
- Motivations
- Central ideas
- Probabilistic correlations and Query expansion
- Evaluations
- Conclusions
|
|
10
|
|
|
11
|
|
|
12
|
- Term-Term Correlations are represented as the conditional probability:
|
|
13
|
|
|
14
|
- For a new query, the following formula
|
|
15
|
- Local technique in general.
- Feasibility in computation.
- No initial retrieval.
- Reflecting most users’ intentions
- Evolve with the accumulations of user usages
|
|
16
|
- Motivations
- Central ideas
- Probabilistic correlations and Query expansion
- Evaluations
- Conclusions
|
|
17
|
- Data
- Two month query logs (Oct 2000-Dem 2000)
- 41,942 documents
- 30 evaluation queries (mostly are short queries)
- Document relevance judged by human assessors.
- Comparing our method with the baseline and the Local Context Analysis
(LCA)
|
|
18
|
- Improvement
- 75.42% over Baseline
- 38.95% over LCA
|
|
19
|
- Examining 50 expansion terms obtained by the log-based method and LCA.
|
|
20
|
- Phrases are extracted from user logs.
- For TREC queries, phrases may not be as effective as expected.
- Not the case for short query.
- Experiments show 11.37% improvement in average when using phrases.
|
|
21
|
- The log-based query expansion produces significant improvements over
both of the baseline method and the LCA method.
- Query expansion is of great importance for short queries on the Web.
- Phrases can improve the performance of web search.
|
|
22
|
- Motivations
- Central ideas
- Probabilistic correlations and Query expansion
- Evaluations
- Conclusions
|
|
23
|
- We show how big the gap exists between the query space and the document
space.
- A new log-based probabilistic query expansion method is proposed to
bridge the gap.
- Experimental results show that our solution is effectual, especially for
short queries in Web searching.
- Log mining enhanced web searching is a very promising direction.
|
|
24
|
|