Awstats
About
Advanced Web Statistics 6.x - http://awstats.sourceforge.net/
Implementation
There are typically three stages to implementing awstats (I have done about 5 or 6 of these implementations now so it's become clockwork).
1. Gather the logs. This could involve some serious headaches, because you might have problems such as missing or incorrectly formatted data, huge logfiles (never rotated?) and the like. Awstats can process may different logfiles though and has a flexible LogFormat directive to get that working. Another twist to this step is that you often find multiple nodes (head-ends) so you have multiple logfiles to combine. The awstats utility logresolvemerge.pl will handle this nicely. I usually created a folder called rollup/ as the output folder for this step.
Example:
cd rollup/ /usr/local/www/awstats/tools/logresolvemerge.pl \ -showsteps \ ../web1/access.log.200802.gz \ ../web2/access.log.200802.gz \ | gzip --fast > access.log.200802.gz
2. Process the logs to-date. This is what I call the catsup (silly like ketchup) mode, you have to decide how far back you want to go date-wise. Usually 6 months or less is sufficient. This step also requires the initial setup of the awstats.www.example.com.conf file
Example:
/usr/local/awstats/wwwroot/cgi-bin/awstats.pl \ -config=www.example.com -update -showsteps \ -LogFile="zcat 2008-02.www_access_log.gz |"
3. Process the logs ongoing. This will require a cronjob or two. The first cronjob is going to copy logfiles down from wherever and put them in a ready-to-process state (see logresolvemerge.pl from step 1). Then other cronjob will actually call awstats.pl with the -update flag.
Example:
#Note this works because LogFile was set appropriately in the config file /usr/local/awstats/wwwroot/cgi-bin/awstats.pl -config=www.example.com -update
Generating static HTML & PDF reports
Another example showing how to generate a set of HTML/PDF files from the metadata. This is sometimes required for high-traffic sites where the metadata for any given month might be in the hundreds of megabytes - that is too much to process dynamically by the CGI script.
/usr/local/www/apache22/data/tools/awstats_buildstaticpages.pl \ -config=www.example.com \ -month=03 \ -year=2008 \ -awstatsprog=/usr/local/www/awstats/cgi-bin/awstats.pl \ -dir=/usr/local/www/awstats/wwwroot/
You can add -buildpdf to those arguments also
Refer to 1 and 2for more information.
Using Maxmind GeoIP lookups
CentOS/RedHat
Install the GeoIP package from EPEL. This should work out of box as the GeoLite Country database appears to be bundled with the package in /usr/share/GeoIP/GeoIP.dat
Debian/Ubuntu
New way: install these packages- libgeo-ip-perl, libgeo-ipfree-perl, geoip-bin
Old way:
Get the code and databases from Maxmind.
Specifically I use Geo-IP-1.31 (perl), Geo-IP-1.4.4 (c) and
also the free GeoIP (country) and GeoIPCity (city) databases - these get gunzipped and the dat files should be put in /usr/local/share/GeoIP/ folder.
Testing
Once the packages are installed installed, test with
geoiplookup 208.75.56.242 GeoIP Country Edition: US, United States
If you installed the GeoLite City database, here is an alternate invocation to use it
geoiplookup 208.75.56.242 -f /usr/local/share/GeoIP/GeoLiteCity.dat GeoIP City Edition, Rev 1: US, CA, San Francisco, 94109, 37.795700, -122.420898, 807, 415
Integrating
Now set these values in the awstats.www.example.com.conf file
LoadPlugin="geoip GEOIP_STANDARD /usr/local/share/GeoIP/GeoIP.dat" LoadPlugin="geoip_city_maxmind GEOIP_STANDARD /usr/local/share/GeoIP/GeoIPCity.dat"
Another alternative is to simply use the geo-ipfree library, like so.
Then add/uncomment this line in awstats.conf
LoadPlugin="geoipfree"
Using WHOIS lookups
This plugin requires the perl module Net/XWhois.pm (Net::XWhois). On Debian/Ubuntu this can be installed using aptitude install libnet-xwhois-perl
Then uncomment the line in awstats.www.example.com.conf
LoadPlugin="hostinfo"
Processing Tips
If you are processing massive log files (>4million hits per day) I recommend the following changes.
- Use
SkipFiles="REGEX[.*\.(com|css|eot|gif|htc|jpg|js|png|rss|svg|swf|ttf)$]"
. Adjust to suit. - Bump $LIMITFLUSH in awstats.pl from 5000 to 1000000 or more.