Notes
Slide Show
Outline
1
Probabilistic Query Expansion Using Query Logs
  • Hang Cui
  • Tianjin University, China
  • Ji-Rong Wen
  • Microsoft Research Asia
  • Jian-Yun Nie
  • University of Montreal
  • Wei-Ying Ma
  • Microsoft Research Asia
2
Outline
  • Motivations
  • Central ideas
  • Probabilistic correlations and Query expansion
  • Evaluations
  • Conclusions


3
Word mismatching
  • The word mismatching problem of web searching
    • Inconsistency of term usages between user queries and documents
      • The Web is not well-organized
      • Users express queries with their own vocabularies
    • Very short queries (less than two words)
    • Simple (key)word matching doesn’t work well
4
Big gap between the query space and the document space
  • Big gap:
    • 73.68 degree on average (Cosine measure = 0.28)
5
Outline
  • Motivations
  • Central ideas
  • Probabilistic correlations and Query expansion
  • Evaluations
  • Conclusions
6
Exploiting query logs
  • Query log – a bridge to connect queries and documents
    • Query session := <query text> [clicked documents]


  • Log-based query expansion.
    • Probabilistic correlations between query terms and document terms
    • The correlations are then used to select high quality expansion terms for new queries
7
Compared with local feedback and relevance feedback
8
Basic assumption
  • Assumption
    • In a query session, the clicked documents are relevant to the given query.
  • Reasonable because:
    • Users do not click documents randomly.
    • Stable from a statistical view
    • Our previous work on query clustering also support this assumption.
9
Outline
  • Motivations
  • Central ideas
  • Probabilistic correlations and Query expansion
  • Evaluations
  • Conclusions


10
Query sessions as a bridge
11
Correlations between query terms and document terms
12
Term-term probabilistic correlations
  • Term-Term Correlations are represented as the conditional probability:
13
Term-term probabilistic correlations (cont.)
14
Query expansion based on term correlations
  • For a new query, the following formula
15
Characteristic of the log-based probabilistic query expansion
  • Local technique in general.
    • Feasibility in computation.
  • No initial retrieval.
  • Reflecting most users’ intentions
  • Evolve with the accumulations of user usages
16
Outline
  • Motivations
  • Central ideas
  • Probabilistic correlations and Query expansion
  • Evaluations
  • Conclusions


17
Data and methodology
  • Data
    • Two month query logs (Oct 2000-Dem 2000)
    • 41,942 documents
    • 30 evaluation queries (mostly are short queries)
  • Document relevance judged by human assessors.
  • Comparing our method with the baseline and the Local Context Analysis (LCA)
18
Experiment I –
retrieval effectiveness
  • Improvement
    • 75.42% over Baseline
    • 38.95% over LCA
19
Experiment II –
quality of expansion terms
  • Examining 50 expansion terms obtained by the log-based method and LCA.
20
Experiment III –
impact of phrases
  • Phrases are extracted from user logs.
  • For TREC queries, phrases may not be as effective as expected.
  • Not the case for short query.
  • Experiments show 11.37% improvement in average when using phrases.
21
Summary of evaluation
  • The log-based query expansion produces significant improvements over both of the baseline method and the LCA method.


  • Query expansion is of great importance for short queries on the Web.


  • Phrases can improve the performance of web search.
22
Outline
  • Motivations
  • Central ideas
  • Probabilistic correlations and Query expansion
  • Evaluations
  • Conclusions


23
Conclusions
  • We show how big the gap exists between the query space and the document space.
  • A new log-based probabilistic query expansion method is proposed to bridge the gap.
  • Experimental results show that our solution is effectual, especially for short queries in Web searching.
  • Log mining enhanced web searching is a very promising direction.
24
"Thanks"


  • Thanks