search term analysis: percentages of term use in a hitwise report

jtrant's picture

The question of what makes a 'meaningful' number of queries of an on-line resource has been lingering in the back of my mind ever since i did the analysis of the Guggenheim Museum search logs last fall. At issue, really, is how to profile and analyse the long tail of user searching. (There was a D-Lib article about this not long ago, that talks about ways to analyse the nature of the tail.)

What brought this back to mind was a Hitwise Newsletter report that included the following analysis of terms that contain 'summer'.

Search Terms Analysis: Search Analysis- "summer" Search Term Analysis
Most popular keywords containing the term "summer" for the 4 weeks ending 05/19/07

Rank Search Term % of searches

1. summer jobs 1.03
2. summer love lyrics 0.84
3. summer activities for kids 0.63
4. summer 0.57
5. summer camps 0.42
6. summer jobs for teens 0.40
7. summer dresses 0.35
8. summer myspace layouts 0.31
9. summer quotes 0.26
10. summer camp 0.25

None of these terms comprises more than ~1% of searches, but together they account for > 5% of all searches in that month. That seems a useful cluster for analysis.

The small set shows us a few of the challenges we'll face in analysing query logs within steve.museum, including problems of semantic clustering:

  • does summer camps (.42%) = summer camp (.25%)?
  • does summer jobs ( 1.03%) = summer jobs for teens (.40%)?
  • where does summer activities for kids (.63%) fit in?
  • does summer love lyrics (.84%) = summer quotes (.26%)?

in 10 terms we've encountered most of the problems of vocabulary normalization:

  • singular / plural forms of terms [camp vs camps],
  • level of specificity / hierarchical structure [jobs vs jobs for teens]
  • synonomy, full or partial [love lyrics vs quotes]
  • what's missing is polysemy [ambiguity in meaning when the same word/character string carries multiple interpretations]

The small analysis also points to what we might be able to extract from tags:

  1. facets of meaning clustered around common concepts [this example being the 'summer' cluster, that pulled together 5% of the queries] and to
  2. term-co-occurence as a useful analytical strategy for learning something about user searaching behaviour as related on on-line art museum collections.

now if I could only get Summer Love out of my head ...

/jt

 

References

Hitwise newsletter, May 2007 HTML Version: http://www.hitwise.com/news/us200705.html

Kalevi Kilkki, A practical model for analyzing long tails by First Monday, volume 12, number 5 (May 2007),
URL: http://firstmonday.org/issues/issue12_5/kilkki/index.html

J. Trant, Understanding Searches of an On-line Contemporary Art Museum Catalogue [Report for steve.museum], December 2006; http://conference.archimuse.com/files/trantSearchTermAnalysis061220a.pdf

Comments

Post new comment

The content of this field is kept private and will not be shown publicly.
CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
Syndicate content