Websites can collect a lot of information about you just from your browsing.

Sometimes useful, sometimes unsettling, but it's interesting to know just how some of this works.

What am I telling them?

When you ask for a particular web page, the request doesn't just ask for the URL, it includes quite a lot of additional detail [1]. Two interesting sets of data are (1) details about your browser and (2) your network address.

USER-AGENT

The user-agent is a description of your browser that generally describes your browser and operating system. A user-agent string might look like:

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; Maxthon)

Although it starts 'Mozilla' this is actually from MS Internet Explorer (you can tell from the MSIE bit).

This information lets web sites display different content for different browsers - if you are using a tablet, you might get the 'mobile' version instead of the 'full' version. This is a Good Thing.

IP Address

When you ask for a page, the web site has to know where to send it back to - your network address. This is generally determined by your ISP - even if you have a private network with your own addresses, once you get to the internet it's your public IP address that counts.

Because IP addresses are publicly allocated to countries and ISPs your address can generally be traced to your city and even suburb. The accuracy of this tracing depends on various factors, but can be up to 98% correct.

This can be used for customising content but not always in a good way:

  • Good: If you’re in France, we should probably show you the French translation - unless you have a cookie that says you want the English version.
  • Not Good: Find sexy singles in [your location].

Can I see this stuff?

A quick web search will suggest many sites that display your user agent, IP address and other details.

Can I use this stuff?

On your own web site, you can see these details for yourself.

The key is having access to the raw request and parsing the contents using a Common Gateway Interface compatible [2] language.

Perl example

#! perl
require CGI;
require Env;
my $q = new CGI;
print $q->header(-type=>'text/html');
print $q->start_html(-title=>'Environment');
print "<h1>Environment</h1>\n<table border='1' cellspacing='1' cellpadding='2'>\n<tr><td>Property</td><td>Value</td></tr>";
foreach $key ( sort keys %ENV )
{
     print "<tr><td>$key</td><td>$ENV{$key}</td></tr>\n";
}
print "</table>\n";
print $q->end_html();

PHP example

<?php
echo '<h1>Environment</H1>';
echo "<table border='1' cellspacing='1' cellpadding='2'>
<tr><td>Property</td><td>Value</td></tr>";
foreach ($_SERVER as $key => $value)
{
     echo "<tr><td>", $key, "</td><td>", $value, "</td></tr>";
}
echo "</table>"; ?>

If you use a content management system such as Joomla, there are probably pre-built extensions and libraries that will make this easier.

Big Scary Warning

Don't display all the server variables on any web page that is publicly accessible. The information included may expose information your site to hackers.

You've been told.

What about cookies

Cookies don't create the information, they just store it for later use (the cookie and its associated data is include with each subsequent request you make to the originating site).

Cookies can be populated with information obtained from the user agent or from other data available to the server (or a combination).

References

[1] Hypertext Transfer Protocol -- HTTP/1.1 see User agent in section 14.43.

[2] The Common Gateway Interface (CGI) Version 1.1 see Request meta-variables in section 4.1.