OJNI


 

 

 

 

An Analysis of Web Site Usage

Rod Ward 

Rod.Ward@Sheffield.ac.uk

Introduction.

The Internet in general, and the World Wide Web in particular, is becoming an increasingly important source of information related to health for professionals and students reflecting wider use amongst all sections of society.

“By 1998 some 6 million people had used the Internet from home, a 76% increase on the previous year, and access and use continues to rise.”

DofH 1999 section 2.6

Nursing & Health Care Resources on the Net is a web site, which attempts to provide an index of over 1500 websites, mailing lists and newsgroups relevant to nurses and other health professionals. It has been running since 1993. At that time, an extensive search of the Internet found only 4 nursing related sites, and a couple of mailing lists with no central index. During 1994 my own students and close colleagues used the site, with each providing about 10 visits per week. As people started to find out about it, they suggested further sites (and mailing lists and newsgroups) to be included, and the site grew.

The site now lists over 1500 net resources for nurses and other health professionals and took over 100,000  hits in 1998. The site has recently been upgraded using Java Script to remove the frames based design and hopefully improve the ease of navigation and improve download times. It has a high graphical content with links to web sites being marked with their logo, as well as a short description. The site is unashamedly an eclectic collection of resources, intended to help novice net users from the healthcare community to begin to explore some of the resources available.  

 

Research questions

 

The purpose of this analysis was to identify who (part of world etc.) was accessing the site, when they accessed, what features they used, and trends in use over the last 5 years. 

Noonan (1997) identified several questions for trying to find out about the people who use a web site;

·     Where do people go online?

·     How long do they stay on a particular site or page?

·     What software are they using? 

·     Where do they get Internet access?

 and highlighted the importance of users behaviour patterns for web site designers;

“If you know what people like to see online, you can tailor your service to meet their needs. If the logs reveal that visitors keep looking for a certain type of material, maybe it should be provided. If the logs reveal that folks spend a minute and are off to something more interesting, maybe it's time to think about adding something like a searchable database or quiz to keep them a while longer. Folks "surf" the net for the same reason they switch channels, they are looking for a reason to stop.” 

She summarised the reasons for using some sort of log file as being:

“You want to know what's popular on your pages (and what isn't), who uses your system  and how they use it, and how you can improve the service.” 

Once one accepts that it is necessary to track the users of a web site, there are a variety of different tools that can be used, all of which have advantages and disadvantages. None of them are very accurate. Linder (1998) summarised this;

“It's been said, rightly, that trying to draw conclusions from web server statistics is like trying to nail Jell-o to the wall.  As you look at the statistics, I cannot stress strongly enough that they should only be used as very general trends and not as gospel truth. These numbers could easily be off (on the low side) by significant percentages. They are not, by any means, "hard numbers." 

Is that a fault with the software? The hardware? The operation of the system? No. The inaccuracy of the numbers is simply a by-product of the way the Web functions. Even the most technically advanced sites have only a general idea of the amount and nature of the traffic on their servers.” 

Linder (1998) goes further;

“Most large commercial sites such as America Online, Prodigy, and CompuServe, use large "caches" on the machines they use to service web requests. That means that once a user looks up one of our pages, it hangs around in memory at that site in case someone else wants to look at it. That way, things run faster - if person "A" looks at our genealogy page, and two minutes later, person "B" wants to look at it, person "B" gets it right away because it is still in memory from person "A", and the machine doesn't have to get it from us again. What effect does this have? 

It has the effect of reducing the number of reported "hits" we receive from those sites. It's entirely possible that we  only register one "hit" for every ten times someone at, for example, CompuServe, looks at the genealogy page. No,  there is no way whatsoever to determine exactly what effect this has on our numbers - it could be a factor of two, it could be orders of magnitude. Probably somewhere in the middle."

 

Methods 

Several different counters, trackers, and log files have been used to measure the page accesses or hits and collect a variety of data about the user. These work in different ways, log files are created by the server, which counts how many requests are received for a file. Counters and trackers require some html code to be placed on a page. When a request for this page is received from any machine connected to the Internet, an additional message is sent if the machine is the server. Counters will then increment by one. Trackers, in addition will record some details about the machine requesting the page e.g.: IP address (the identification of the machine requesting the page, which identifies the country or network being used), how the request was generated (e.g. a link from another page or search engine) and the set-up of the user (e.g. browser being used,  screen resolution etc). 

 Each method of measuring accesses and user/machine details has advantages and disadvantages, and their accuracy is influenced by the state of the Internet connection, their placement on a web page and the configuration of the user's system. 

The terms hit, visit or access in this context are used interchangeably, to indicate one request being received for that particular page. Alternative data about page impressions or volume of data transferred can provide additional information but were not used in this study. 

The primary sources of this data have been the Nedstat ( http://www.nedstat.net ) and Extreme (http://www.extreme-dm.com) counters which are both available free, but require the owner of the web site to insert some html code into the site to make them work. 

If users are accessing the site from a network that uses a cache (e.g. a university or commercial Internet Service Provider) their page views will not be recorded because the page is downloaded onto the local network for use. It is therefore likely, in those cases, that the number of hits in total considerably exceeds the numbers recorded by the trackers.  

 

Results 

The site itself, was one of the first for nursing on the World Wide Web. It was started in 1994 and received about 10,000 hits in the first 2 years (1994-1996). It was apparent from the exposure of the site, that we should explore the nature of its usage. In 1998, a counter and tracker were implemented to glean statistical data to analyze this web site's usage. The counter and tracker were placed into use when they became available to the site. The Nedstat counter was added to the site on July 2, 1998 and by the December 31, 1998 had recorded 51,591 hits. The Extreme tracker was added on May 13, 1998 and by December 31, 1998 recorded 39,091 unique visitors and a total of 118,512 page impressions (this includes where the same user has visited more than once and reloaded the page). Both counter and tracking methods are being used presently.

The site took over 100,000 hits during 1998. Access statistics were examined to see where these were coming from, times of day and days of the week, what browsers and operating systems were being used to build up a user profile.  

 

Hits per day 

The number of hits on the site varies with the days of the week, showing a visible drop in hits at weekends as can be seen by the example on Table 1.

 

Table 1

Table 2 shows the hits coming to the site by hour of the day in Greenwich Mean Time (GMT) between February 1998 and July 1999. The peak level of hits occurred during 15.00-15.59 and the lowest level during  07.00-07.59 .

 

Table 2

The hits for different pages show variations e.g. in times of the day e.g. nursing-UK (Table 3) & nursing - North-America  (Table 4). Table 3 shows the hits coming to the site by hour of the day in Greenwich Mean Time (GMT) from the UK between January 1998 and July 1999. The peak level of hits occurred during 14.00-14.59 and the lowest level during  04.00-04.59 .

   

Table 3

 

Table 4 shows the hits coming to the site by hour of the day in Greenwich Mean Time (GMT) from North America between January 1998 and July 1999. The peak level of hits occurred during 20.00-20.59 and the lowest level during  10.00-10.59.  

Table 4

The hits per Month from May 1998 to July 1999 are shown in Table 5. This revealed a gradual rise during 1998 with a fall around the Christmas period then a large rise in January 1999.

Table 5

The visitors  to the site are coming from all over the world as shown in Table 6. The largest number was from UK (.uk) domains followed by Network (.net), Commercial (.com) and unknown, which it is impossible to classify. The next largest section was from the USA, primarily educational institutions, (.edu) and then Australia (.au) and Canada (.ca). These are divided by continent in Table 7.

Table   Domains/Countries from which hits are coming - 13 May 98-16 July 99 (Source Extreme)

URL Extension

Country or Network

Number of Hits

Percentage of Hits

.uk

United Kingdom

18883

21.61%

.net

Network

18600

21.29%

.com

Commercial

18598

21.29%

-

unknown

18555

21.24%

.edu

US Educational

3358

3.84%

.au

Australia

1920

2.19%

.ca

Canada

1141

1.30%

.org

Non-Profit Organisations

751

0.85%

.us

United States

593

0.67%

.se

Sweden

515

0.58%

.nz

New Zealand

495

0.56%

.ie

Ireland

348

0.39%

.de

Germany

341

0.39%

.be

Belgium

284

0.32%

.jp

Japan

250

0.28%

.sg

Singapore

190

0.21%

.no

Norway

165

0.18%

.nl

Netherlands

161

0.18%

.gov

US Government

160

0.18%

.fi

Finland

156

0.17%

.mil

US Military

152

0.17%

.it

Italy

129

0.14%

.is

Iceland

114

0.13%

.dk

Denmark

110

0.12%

.ch

Switzerland

102

0.11%

 + over 80 other countries/domains with less than 100 (0.1%) hits.

Table 7  Continents from which hits are coming - 13 may 98-16 July 99 (Source Extreme)

Continent

Number of hits

Percentage of hits

Unknown

37957

43.45%

North-America

24040

27.52%

Europe

21827

24.98%

Australia

2417

2.76%

Asia

904

1.03%

South America

93

0.10%

Africa

66

0.07%

Central America

45

0.05%

 

The way in which users arrive at the site showed that most are clicking on a link on another website, rather than using a search engine, as shown in Table  8. Very few of the users recorded were arriving via email or usenet newsgroup messages.  The most popular search engines that were used are shown in Table 9 and  1457 different keywords were used, the most popular are shown in Table 10.

Over 2500 other web sites have links to this one. The sites that are providing the most referrals in the form of hyperlinks which point to this site are shown in Table 11.

 

Table 8 Referral Sources 13 May 98-16 July 99 (Source Extreme)

Source

Number of hits

Percentage of hits

Website

34031

73.75%

Searchengine

11329

24.55%

Email

426

0.92%

Harddisk

280

0.60%

Usenet

77

0.16%

 

Table 9  Most popular search engines 13 may98-16 July 99 (Source Extreme)

Search Engine

Number of referrals

Percentage of referrals

Excite

5227

46.13%

Yahoo

2911

25.69%

Infoseek

1470

12.97%

WebCrawler

498

4.39%

Altavista

252

2.22%

Looksmart

210

1.85%

Lycos

177

1.56%

Hotbot

118

1.04%

Goo!

117

1.03%

+    12   others with less than 1.00%

 

Table 10  Keywords used in search engines 13 May 98-16 July 99 (Source Extreme)

Word

Number

Percentage

Nursing

3290

25.21%

Care

820

6.28%

Health

806

6.17%

Nurses

414

3.17%

UK

340

2.60%

Nurse

277

2.12%

 

Table 11  Top referring pages 13 may 98-16 July 99 (Source Extreme)

Number of Referrals

Percentage

Referring site

1679

4.93%

http://medweb.bham.ac.uk/nursing/

1040

3.05%

http://www.shef.ac.uk/~nr1rw/

824

2.42%

http://ukplus.co.uk/ukplus/owa/pkg_ie4_search.p_text_search

528

1.55%

http://www.springnet.com/pn/pnels.htm

443

1.30%

http://www.wwnurse.com/topsites/

394

1.15%

http://isd1/extweb/nurseinfo.html

373

1.09%

http://www.healthprofessionals.com/links.htm

345

1.01%

http://www.freeserve.ukplus.co.uk/dynamic/freeserve/results.html

344

1.01%

http://www.healthcentre.org.uk/np/links.htm

308

0.90%

http://www.springnet.com/sn/snels.htm

305

0.89%

http://med714.bham.ac.uk/nursing/

299

0.87%

http://www.nursing.soton.ac.uk/Nav/Outside.htm

267

0.78%

http://www.man.ac.uk/rcn/d&udatabase.html

260

0.75%

http://www.slackinc.com/allied/allnet-w.htm

245

0.72%

http://www.lib.uiowa.edu/hardin/md/nurs.html

242

0.71%

http://www.nursing-standard.co.uk/nsites.htm

241

0.70%

http://www.man.ac.uk/bcsnsg/nursite.html

Note: This does not take into account referrals from multiple pages within one domain e.g. all pages of Nursing Standard Online.

 

The visitors to the site were using a wide range of browsers (see Table 12), operating systems (see Table 13), screen resolutions (see Table 14) & screen colours (see Table 15).

Microsoft Internet Explorer (MSIE) in all its versions was the most often reported with 49.03% of users, Netscape was second with 42.13%, with another 8.83% using other browsers. The full breakdown including version numbers is shown on Table 12.

The vast majority of hits were coming from Windows operating systems (96.01%) and 2.66% from Macs as shown in Table 13. Windows 95 being the most used system followed by Windows 98 and Win 3.1

The screen resolutions and colours being used also varied widely as shown in Tables 14 and 15.

 

Table 12 Browsers used 13 May 98-27 July 99 (Source Extreme)

Browser

No. of hits

Percentage of hits

MSIE 4

29179

33.39%

Netscape 4

23332

26.70%

Netscape 3

11974

13.70%

MSIE 3

10526

12.04%

AOL 4

3737

4.27%

MSIE 5

3096

3.54%

AOL 3

2991

3.42%

Netscape 2

1506

1.72%

WebTV 1

644

0.73%

AOL-IWENG 3

206

0.23%

Other

105

0.12%

MSIE 2

36

0.04%

Opera 3

29

0.03%

Ibrowse 1

2

0.00%

AmigaVoyager 2

1

0.00%

 

Table 13 Operating systems 13 May 98-17 July 99 (Source Extreme)

Operating System

Number

Percentage

Windows 95

53261

60.96%

Windows 98

15250

17.45%

Windows 3.1

7742

8.86%

Windows NT

7628

2.66%

Macintosh

2327

2.66%

Web TV

558

0.63%

Other

384

0.43%

SunOS 5

85

0.43%

Linnux 2

38

0.04%

 

Table 14 Screen Resolutions 13 May 98-27 July 99 (Source Extreme)

Screen Resolution

Number

Percentage

800x600

33814

63.40%

640x480

11257

21.10%

1024x768

6480

12.15%

Other

1019

1.91%

1152x864

398

0.74%

1280x1024

319

0.59%

1600x1200

41

0.07%

 

Table 15 Screen Colours 13 May 98-27 July 99 (Source Extreme)

Screen Colours

Number

Percentage

16 Bit (65K)

26016

48.78%

8 Bit (256)

14101

26.44%

32 Bit (16.7M)

6448

12.09%

24 bit (16.7M)

5716

10.71%

Other

1047

1.96%

 

Discussion

The known inaccuracy of the counter and tracker, and the lack of similar published data from other similar sites makes it difficult to fully explore the significance of these results, however some inferences can be drawn about the users of the site.

The visible fall in hits at the weekend is likely due to of the intended academic nature of the site and the likely way in which many users are accessing it from an Internet connection within a university or college or from a work setting. However, as a large number of users are from the UK and internet connections from UK hospitals are known to be poor (Kerrimore, 1999); there is an increase in the number of people accessing from home (DofH, 1999) and this trend may change.

The times of the day when the site is hit reflect a UK or Western European user profile, and those from work or academic settings, with most hits during the working day 09.00-17.00 GMT and evening period 18.00-22.00. This is particularly true for the Nursing-UK section, but overlaid with those from North America where the hit pattern is 5-7 hours behind the UK. 

The variation of hits between months of the year, shows a gradual rise May through July (2000 to 4000 hits), decrease in August (3500 hits) with a sharp increase in October ( 7700 hits) and then a fall over the Christmas period (4300 hits) of 1998, perhaps reflecting the academic calendar. Usage peaked in January 1999 to 8400 hits. This rise corresponded with an upgrade of the site followed by a publicity campaign to mailing lists, newsgroups and resubmitting it to search engines and directories, as described elsewhere (Ward, 1998).

Analysis of the countries from which visitors are arriving at the site is more difficult with over 20% being unknown and another 40% coming from .net or .com domains which cannot be resolved to a particular country, but are likely to be predominately from North America. The highest country was the UK probably reflecting my own biases and the publicity the site has received within the UK nursing community. In addition,  the number of nursing and healthcare related web sites in North America far outstrips those elsewhere in the world, potentially meaning that this site has less competition in the UK or appeals to a different group of users.  Unfortunately, it is not possible with this tracker to identify how many of the UK hits are coming from academic addresses (.ac) or those from National Health Service (.nhs), etc. The preponderance of North American and European users reflects the variation in access capabilities in these countries when compared to Africa, Asia, and Central and South America. The fact that the site is only in English may also discourage use from some parts of the world.

The way in  which users find this or any other site, from the approximately 500 million available on the Internet, is complex and an area which would benefit from further research. Many Internet commentators suggest that most sites are found predominately through the use of search engines. This is not reflected by the statistics for this site, which show almost 75% of visitors arriving via a hyperlink from another web site. This shows the importance of ensuring that web sites are listed in the links pages of other related sites. Over 2500 sites have links to this one with the Nurse site (http://medweb.bham.ac.uk/nursing) providing nearly 5% of referrals. The Nurse site is no longer maintained and its first pointer for users to redirect to is the Nursing & Health Care Resources on the Net site. Of those that did find the site via a search engine, almost half used Excite, the next largest being Yahoo, which includes a link to this site from various relevant sections. This shows the importance for website creators to go through the submissions processes for the various search engines and attempt via the use of metatags and resubmission to keep their entry amongst the first 10 or 20 results returned by each search engine for relevant keywords (Ward, 1998). The most popular search terms, which enabled people to find this site were nursing, care, health, nurses, UK and nurse, although over 1000 other terms had been included in searches.

The numbers arriving following messages by email or to Usenet newsgroups was very small. Perhaps showing that this is not a particularly productive way of promoting a web site.

The variations in operating systems, browsers, screen resolutions and screen colours are also relevant for web site designers. Unless some users are to be excluded, site design should be tested in a range of these configurations, and the use of special functions, which are specific to Netscape, MSIE, or other browsers, should be avoided.

 

Conclusions

It is important for both Internet users and web site providers to be aware of some issues to improve web site  access. The user profile of a site will generally reflect its content. Sites need to be advertised in search engines and other site owners need to be encouraged to link to other related sites.

Users made aware of the variations in usage by days of the week and times of the day can gain fastest access to materials. Usage patterns will vary with country and type of access. I believe the access pattern for this site is reflected for many sites and if users are able to use the World Wide Web during the quieter periods 05.00-09.00 GMT and Saturdays and Sundays, their service will be faster, because of less usage of the server of these pages and general bandwidth restrictions on the Internet particularly transatlantic traffic.

The web is an international medium and this site is only available in English. This could have had a large impact on the amount of countries represented. This international flavor should be taken into account, stressing the importance of recognising cultural and language differences. As language translators become available, this will not be an issue for web sites.

The technical variations in the users' machines, operating systems and configurations mean that site designers must take account of different user needs and test their materials for various set-ups to ensure it can be used by as many people as possible.

This paper has used freely available technologies to identify some characteristics about the users of one particular web site. Further study is needed to explore the profile of Internet users and their behaviours. Once these characteristics are identified, the creators and maintainers of World Wide Web resources need to recognise the variations and similarities in users, and the capabilities of their software, to ensure that information provision more closely meets the needs of the user.