|
OJNI
|
|
|
An Analysis of Web Site UsageRod Ward Introduction.The Internet in
general, and the World Wide Web in particular, is becoming an increasingly
important source of information related to health for professionals and students
reflecting wider use amongst all sections of society.
DofH
1999 section 2.6 Nursing &
Health Care Resources on the Net is a web site, which attempts to provide an
index of over 1500 websites, mailing lists and newsgroups relevant to nurses and
other health professionals. It has been running since 1993. At that time, an
extensive search of the Internet found only 4 nursing related sites, and a
couple of mailing lists with no central index. During 1994 my own students and
close colleagues used the site, with each providing about 10 visits per week. As
people started to find out about it, they suggested further sites (and mailing
lists and newsgroups) to be included, and the site grew. The site now
lists over 1500 net resources for nurses and other health professionals and took
over 100,000 hits in 1998. The site
has recently been upgraded using Java Script to remove the frames based design
and hopefully improve the ease of navigation and improve download times. It has
a high graphical content with links to web sites being marked with their logo,
as well as a short description. The site is unashamedly an eclectic collection
of resources, intended to help novice net users from the healthcare community to
begin to explore some of the resources available. Research questions The purpose of
this analysis was to identify who (part of world etc.) was accessing the site,
when they accessed, what features they used, and trends in use over the last 5
years. Noonan (1997)
identified several questions for trying to find out about the people who use a
web site; ·
Where do people go
online? ·
How long do they stay
on a particular site or page? ·
What software are they
using? ·
Where do they get
Internet access? and
highlighted the importance of users behaviour patterns for web site designers; “If
you know what people like to see online, you can tailor your service to meet
their needs. If the logs reveal that visitors keep looking for a certain type of
material, maybe it should be provided. If the logs reveal that folks spend a
minute and are off to something more interesting, maybe it's time to think about
adding something like a searchable database or quiz to keep them a while longer.
Folks "surf" the net for the same reason they switch channels, they
are looking for a reason to stop.” She summarised
the reasons for using some sort of log file as being: “You
want to know what's popular on your pages (and what isn't), who uses your system
and how they use it, and how you can improve the service.” Once one accepts
that it is necessary to track the users of a web site, there are a variety of
different tools that can be used, all of which have advantages and
disadvantages. None of them are very accurate. Linder (1998) summarised this; “It's
been said, rightly, that trying to draw conclusions from web server statistics
is like trying to nail Jell-o to the wall. Is
that a fault with the software? The hardware? The operation of the system? No.
The inaccuracy of the numbers is simply a by-product of the way the Web
functions. Even the most technically advanced sites have only a general idea of
the amount and nature of the traffic on their servers.” Linder (1998)
goes further; “Most
large commercial sites such as America Online, Prodigy, and CompuServe, use
large "caches" on the machines
they use to service web requests. That means that once a user looks up one of
our pages, it hangs around in memory at that site in case someone else wants to
look at it. That way, things run faster - if person "A" looks at our
genealogy page, and two minutes later, person "B" wants to look at it,
person "B" gets it right away because it is still in memory from
person "A", and the machine doesn't have to get it from us again. What
effect does this have? It has the effect of reducing the number of reported "hits" we receive from those sites. It's entirely possible that we only register one "hit" for every ten times someone at, for example, CompuServe, looks at the genealogy page. No, there is no way whatsoever to determine exactly what effect this has on our numbers - it could be a factor of two, it could be orders of magnitude. Probably somewhere in the middle." Methods
Several
different counters, trackers, and log files have been used to measure the page
accesses or hits and collect a variety of data about the user. These work in
different ways, log files are created by the server, which counts how many
requests are received for a file. Counters and trackers require some html code
to be placed on a page. When a request for this page is received from any
machine connected to the Internet, an additional message is sent if the machine
is the server. Counters will then increment by one. Trackers, in addition will
record some details about the machine requesting the page e.g.: IP address (the
identification of the machine requesting the page, which identifies the country
or network being used), how the request was generated (e.g. a link from another
page or search engine) and the set-up of the user (e.g. browser being used,
screen resolution etc). Each
method of measuring accesses and user/machine details has advantages and
disadvantages, and their accuracy is influenced by the state of the Internet
connection, their placement on a web page and the configuration of the user's
system. The terms hit,
visit or access in this context are used interchangeably, to indicate one
request being received for that particular page. Alternative data about page
impressions or volume of data transferred can provide additional information but
were not used in this study. The primary
sources of this data have been the Nedstat ( http://www.nedstat.net ) and
Extreme (http://www.extreme-dm.com) counters which are both available free, but
require the owner of the web site to insert some html code into the site to make
them work. If users are
accessing the site from a network that uses a cache (e.g. a university or
commercial Internet Service Provider) their page views will not be recorded
because the page is downloaded onto the local network for use. It is therefore
likely, in those cases, that the number of hits in total considerably exceeds
the numbers recorded by the trackers. Results
The site itself,
was one of the first for nursing on the World Wide Web. It was started in 1994
and received about 10,000 hits in the first 2 years (1994-1996). The site took
over 100,000 hits during 1998. Access statistics were examined to see where
these were coming from, times of day and days of the week, what browsers and
operating systems were being used to build up a user profile. Hits per dayThe number of hits on the site varies with the days of the week, showing a visible drop in hits at weekends as can be seen by the example on Table 1.
Table 1
Table 2 shows
the hits coming to the site by hour of the day in Greenwich Mean Time (GMT)
between February 1998 and July 1999. The peak level of hits occurred during
15.00-15.59 and the lowest level during 07.00-07.59
Table 2
The hits for
different pages show variations e.g. in times of the day e.g. nursing-UK (Table
3) & nursing - North-America
(Table 4). Table 3 shows the hits
coming to the site by hour of the day in Greenwich Mean Time (GMT) from the UK
between January 1998 and July 1999. The peak level of hits occurred during
14.00-14.59 and the lowest level during 04.00-04.59 Table 3
Table 4 shows the hits coming to the site by hour of the day in Greenwich Mean Time (GMT) from North America between January 1998 and July 1999. The peak level of hits occurred during 20.00-20.59 and the lowest level during 10.00-10.59.
Table 4
The hits per Month from May 1998 to July 1999 are shown in Table 5. This revealed a gradual rise during 1998 with a fall around the Christmas period then a large rise in January 1999. Table 5
The visitors to the site are coming from all over the world as shown in Table 6. The largest number was from UK (.uk) domains followed by Network (.net), Commercial (.com) and unknown, which it is impossible to classify. The next largest section was from the USA, primarily educational institutions, (.edu) and then Australia (.au) and Canada (.ca). These are divided by continent in Table 7. Table 6 Domains/Countries from which hits are coming - 13 May 98-16 July 99 (Source Extreme)
+
over 80 other countries/domains with less than 100 (0.1%) hits.
Table 7 Continents from which hits are coming - 13 may 98-16 July 99 (Source Extreme)
The way in which
users arrive at the site showed that most are clicking on a link on another
website, rather than using a search engine, as shown in Table
8. Very few of the users recorded were arriving via email or usenet
newsgroup messages. The most
popular search engines that were used are shown in Table 9 and 1457
different keywords were used, the most popular are shown in Table 10. Over 2500 other
web sites have links to this one. The sites that are providing the most
referrals in the form of hyperlinks which point to this site are shown in Table
11. Table 8 Referral Sources 13 May 98-16 July 99 (Source Extreme)
Table 9 Most popular search engines 13 may98-16 July 99 (Source Extreme)
+ 12 others with less than 1.00% Table 10 Keywords used in search engines 13 May 98-16 July 99 (Source Extreme)
Table 11 Top referring pages 13 may 98-16 July 99 (Source Extreme)
Note: This does not take into account referrals from multiple pages within one domain e.g. all pages of Nursing Standard Online.
The visitors to the site were using a wide range of browsers (see Table 12), operating systems (see Table 13), screen resolutions (see Table 14) & screen colours (see Table 15). Microsoft
Internet Explorer (MSIE) in all its versions was the most often reported with
49.03% of users, Netscape was second with 42.13%, with another 8.83% using other
browsers. The full breakdown including version numbers is shown on Table 12. The vast
majority of hits were coming from Windows operating systems (96.01%) and 2.66%
from Macs as shown in Table 13. Windows 95 being the most used system followed
by Windows 98 and Win 3.1 The screen resolutions and colours being used also varied widely as shown in Tables 14 and 15.
Table 12 Browsers used 13 May 98-27 July 99 (Source Extreme)
Table 13 Operating systems 13 May 98-17 July 99 (Source Extreme)
Table 14 Screen Resolutions 13 May 98-27 July 99 (Source Extreme)
Table 15 Screen Colours 13 May 98-27 July 99 (Source Extreme)
DiscussionThe known
inaccuracy of the counter and tracker, and the lack of similar published data
from other similar sites makes it difficult to fully explore the significance of
these results, however some inferences can be drawn about the users of the site. The visible fall
in hits at the weekend is likely due to of the intended academic nature of the
site and the likely way in which many users are accessing it from an Internet
connection within a university or college or from a work setting. However, as a
large number of users are from the UK and internet connections from UK hospitals
are known to be poor (Kerrimore, 1999); there is an increase in the number of
people accessing from home (DofH, 1999) and this trend may change. The times of the day when the site is hit reflect a UK or Western European user profile, and those from work or academic settings, with most hits during the working day 09.00-17.00 GMT and evening period 18.00-22.00. This is particularly true for the Nursing-UK section, but overlaid with those from North America where the hit pattern is 5-7 hours behind the UK. The variation of
hits between months of the year, shows a gradual rise May through July (2000 to
4000 hits), decrease in August (3500 hits) with a sharp increase in October (
7700 hits) and then a fall over the Christmas period (4300 hits) of 1998,
perhaps reflecting the academic calendar. Usage peaked in January 1999 to 8400
hits. This rise corresponded with an upgrade of the site followed by a publicity
campaign to mailing lists, newsgroups and resubmitting it to search engines and
directories, as described elsewhere (Ward, 1998). Analysis of the
countries from which visitors are arriving at the site is more difficult with
over 20% being unknown and another 40% coming from .net or .com domains which
cannot be resolved to a particular country, but are likely to be predominately
from North America. The highest country was the UK probably reflecting my own
biases and the publicity the site has received within the UK nursing community.
In addition, the number of nursing and healthcare related web sites in
North America far outstrips those elsewhere in the world, potentially meaning
that this site has less competition in the UK or appeals to a different group of
users. Unfortunately, it is not possible with this tracker to identify how
many of the UK hits are coming from academic addresses (.ac) or those from
National Health Service (.nhs), etc. The preponderance of North American and
European users reflects the variation in access capabilities in these countries
when compared to Africa, Asia, and Central and South America. The fact that the
site is only in English may also discourage use from some parts of the world. The way in
which users find this or any other site, from the approximately 500
million available on the Internet, is complex and an area which would benefit
from further research. Many Internet commentators suggest that most sites are
found predominately through the use of search engines. This is not reflected by
the statistics for this site, which show almost 75% of visitors arriving via a
hyperlink from another web site. This shows the importance of ensuring that web
sites are listed in the links pages of other related sites. Over 2500 sites have
links to this one with the Nurse site (http://medweb.bham.ac.uk/nursing)
providing nearly 5% of referrals. The Nurse site is no longer maintained and its
first pointer for users to redirect to is the Nursing & Health Care
Resources on the Net site. Of those that did find the site via a search engine,
almost half used Excite, the next largest being Yahoo, which includes a link to
this site from various relevant sections. This shows the importance for website
creators to go through the submissions processes for the various search engines
and attempt via the use of metatags and resubmission to keep their entry amongst
the first 10 or 20 results returned by each search engine for relevant keywords
(Ward, 1998). The most popular search terms, which enabled people to find this
site were nursing, care, health, nurses, UK and nurse, although over 1000 other
terms had been included in searches. The numbers
arriving following messages by email or to Usenet newsgroups was very small.
Perhaps showing that this is not a particularly productive way of promoting a
web site. The variations in operating systems, browsers, screen resolutions and screen colours are also relevant for web site designers. Unless some users are to be excluded, site design should be tested in a range of these configurations, and the use of special functions, which are specific to Netscape, MSIE, or other browsers, should be avoided. ConclusionsIt is important
for both Internet users and web site providers to be aware of some issues to
improve web site access. The user profile of a site will generally reflect
its content. Sites need to be advertised in search engines and other site owners
need to be encouraged to link to other related sites. Users made aware
of the variations in usage by days of the week and times of the day can gain
fastest access to materials. Usage patterns will vary with country and type of
access. The web is an
international medium and this site is only available in English. This could have
had a large impact on the amount of countries represented. This international
flavor should be taken into account, stressing the importance of recognising
cultural and language differences. The technical
variations in the users' machines, operating systems and configurations mean
that site designers must take account of different user needs and test their
materials for various set-ups to ensure it can be used by as many people as
possible. This paper has
used freely available technologies to identify some characteristics about the
users of one particular web site. Further study is needed to explore the profile
of Internet users and their behaviours. Once these characteristics are
identified, the creators and maintainers of World Wide Web resources need to
recognise the variations and similarities in users, and the capabilities of
their software, to ensure that information provision more closely meets the
needs of the user. |