
The question of what makes a 'meaningful' number of queries of an on-line resource has been lingering in the back of my mind ever since i did the analysis of the Guggenheim Museum search logs last fall. At issue, really, is how to profile and analyse the long tail of user searching. (There was a D-Lib article about this not long ago, that talks about ways to analyse the nature of the tail.)
What brought this back to mind was a Hitwise Newsletter report that included the following analysis of terms that contain 'summer'.
Search Terms Analysis: Search Analysis- "summer" Search Term Analysis
Most popular keywords containing the term "summer" for the 4 weeks ending 05/19/07
Rank Search Term % of searches
1. summer jobs 1.03
2. summer love lyrics 0.84
3. summer activities for kids 0.63
4. summer 0.57
5. summer camps 0.42
6. summer jobs for teens 0.40
7. summer dresses 0.35
8. summer myspace layouts 0.31
9. summer quotes 0.26
10. summer camp 0.25
None of these terms comprises more than ~1% of searches, but together they account for > 5% of all searches in that month. That seems a useful cluster for analysis.
The small set shows us a few of the challenges we'll face in analysing query logs within steve.museum, including problems of semantic clustering:
in 10 terms we've encountered most of the problems of vocabulary normalization:
The small analysis also points to what we might be able to extract from tags:
now if I could only get Summer Love out of my head ...
/jt
References
Hitwise newsletter, May 2007 HTML Version: http://www.hitwise.com/news/us200705.html
Kalevi Kilkki, A practical model for analyzing long tails by First Monday, volume 12, number 5 (May 2007),
URL: http://firstmonday.org/issues/issue12_5/kilkki/index.html
J. Trant, Understanding Searches of an On-line Contemporary Art Museum Catalogue [Report for steve.museum], December 2006; http://conference.archimuse.com/files/trantSearchTermAnalysis061220a.pdf
Comments
Post new comment