Awstats

From ConShell
Jump to navigation Jump to search

About

Advanced Web Statistics 6.x - http://awstats.sourceforge.net/

Implementation

There are typically three stages to implementing awstats (I have done about 5 or 6 of these implementations now so it's become clockwork).

1. Gather the logs. This could involve some serious headaches, because you might have problems such as missing or incorrectly formatted data, huge logfiles (never rotated?) and the like. Awstats can process may different logfiles though and has a flexible LogFormat directive to get that working. Another twist to this step is that you often find multiple nodes (head-ends) so you have multiple logfiles to combine. The awstats utility logresolvemerge.pl will handle this nicely. I usually created a folder called rollup/ as the output folder for this step.

Example:

cd rollup/
/usr/local/www/awstats/tools/logresolvemerge.pl \
-showsteps \
../web1/access.log.200802.gz \
../web2/access.log.200802.gz \
| gzip --fast > access.log.200802.gz


2. Process the logs to-date. This is what I call the catsup (silly like ketchup) mode, you have to decide how far back you want to go date-wise. Usually 6 months or less is sufficient. This step also requires the initial setup of the awstats.www.example.com.conf file

Example:

/usr/local/awstats/wwwroot/cgi-bin/awstats.pl \
-config=www.example.com -update -showsteps \
-LogFile="zcat 2008-02.www_access_log.gz |"


3. Process the logs ongoing. This will require a cronjob or two. The first cronjob is going to copy logfiles down from wherever and put them in a ready-to-process state (see logresolvemerge.pl from step 1). Then other cronjob will actually call awstats.pl with the -update flag.

Example:

#Note this works because LogFile was set appropriately in the config file
/usr/local/awstats/wwwroot/cgi-bin/awstats.pl -config=www.example.com -update


Generating static HTML & PDF reports

Another example showing how to generate a set of HTML/PDF files from the metadata. This is sometimes required for high-traffic sites where the metadata for any given month might be in the hundreds of megabytes - that is too much to process dynamically by the CGI script.

/usr/local/www/apache22/data/tools/awstats_buildstaticpages.pl \
-config=www.example.com \
-month=03 \
-year=2008 \
-awstatsprog=/usr/local/www/awstats/cgi-bin/awstats.pl \
-dir=/usr/local/www/awstats/wwwroot/ 

You can add -buildpdf to those arguments also

Refer to 1 and 2for more information.

Using Maxmind GeoIP lookups

CentOS/RedHat

Install the GeoIP package from EPEL. This should work out of box as the GeoLite Country database appears to be bundled with the package in /usr/share/GeoIP/GeoIP.dat

Debian/Ubuntu

New way: install these packages- libgeo-ip-perl, libgeo-ipfree-perl, geoip-bin


Old way: Get the code and databases from Maxmind. Specifically I use Geo-IP-1.31 (perl), Geo-IP-1.4.4 (c) and also the free GeoIP (country) and GeoIPCity (city) databases - these get gunzipped and the dat files should be put in /usr/local/share/GeoIP/ folder.

Testing

Once the packages are installed installed, test with

geoiplookup 208.75.56.242
GeoIP Country Edition: US, United States

If you installed the GeoLite City database, here is an alternate invocation to use it

geoiplookup 208.75.56.242 -f /usr/local/share/GeoIP/GeoLiteCity.dat 
GeoIP City Edition, Rev 1: US, CA, San Francisco, 94109, 37.795700, -122.420898, 807, 415

Integrating

Now set these values in the awstats.www.example.com.conf file

LoadPlugin="geoip GEOIP_STANDARD /usr/local/share/GeoIP/GeoIP.dat"
LoadPlugin="geoip_city_maxmind GEOIP_STANDARD /usr/local/share/GeoIP/GeoIPCity.dat"

Another alternative is to simply use the geo-ipfree library, like so.

Then add/uncomment this line in awstats.conf

LoadPlugin="geoipfree"

Using WHOIS lookups

This plugin requires the perl module Net/XWhois.pm (Net::XWhois). On Debian/Ubuntu this can be installed using aptitude install libnet-xwhois-perl

Then uncomment the line in awstats.www.example.com.conf

LoadPlugin="hostinfo"

Processing Tips

If you are processing massive log files (>4million hits per day) I recommend the following changes.

  • Use SkipFiles="REGEX[.*\.(com|css|eot|gif|htc|jpg|js|png|rss|svg|swf|ttf)$]". Adjust to suit.
  • Bump $LIMITFLUSH in awstats.pl from 5000 to 1000000 or more.

Related