processing web logs

For a church website I maintain I needed to find a way to produce graphical log data from raw log files. The web host it currently sits on does not have awstats or cPanel available, so I devised a way to pull down the logs via ftp, process them locally on a Ubuntu box and post them back up to the web host.

Tools used
Bash script
ncftp
grep

I need to determine what the files are to retrieve and then retrieve them. To do this, ncftp has a nice feature that allows macros to be run when invoked.

Edit $HOME/.netrc_template and enter the following:

[code]

machine xxx.xxx.xxx.xxx
login yyyyyyyy
password zzzzzzzz

macdef displaylistbydate
cd /logs
bin
ls -l
quit

macdef getlogfile
cd /logs
bin
get 2013-09-35-web.log
get 2013-09-36-web.log
get 2013-09-37-web.log

quit

[/code]

This does what I need. But how to programatically determine what the files to download are? The filenames will update when the logs rollover so the filenames need to be recalculated.

For this, an additional bash script to query the server and retrieve those files that are relevant.

The main shell script is as follows

[code]

#!/bin/bash

# This part gets the applicable log files from the ftp server
echo “\$ displaylistbydate” | ftp -v xxx.xxx.xxx.xxx > $HOME/sydney_logs/all_log_files
grep -v web.log.bz2 $HOME/sydney_logs/all_log_files | grep -o 2[0-9][0-9][0-9]-.*log >$HOME/sydney_logs/files_to_get

#This part dynamically modifies the .netrc file to pull the above files
FILE=$HOME/sydney_logs/files_to_get
if [ ! -f $FILE ]; then
echo “$FILE : does not exists”
exit 1
elif [ ! -r $FILE ]; then
echo “$FILE: can not read”
exit 2
fi

exec 3<&0
exec 0<$FILE
cp $HOME/.netrc_template $HOME/.netrc
while read line
do
# use $line variable to process line in processLine() function
echo “        get $line” >> $HOME/.netrc
done

echo “        quit” >> $HOME/.netrc
echo “” >> $HOME/.netrc

# acutally retrieve the files
cd $HOME/sydney_logs
echo “\$ getlogfile” | ftp -v xxx.xxx.xxx.xxx

# copy the log file and let awstats process the file
exec 3<&0
exec 0<$FILE
while read line
do
echo “copying $line”
cp $line $HOME/awstats/mylog.log
perl $HOME/awstats/update
done
cd $HOME/awstats
echo “Generating a report (may take a few mins)”
$HOME/awstats/genreports
cd $HOME/sydney_logs

#upload the report to the server
./uploadstats

[/code]

The grep command looks for all files that aren’t archived bz2 files and follow a certain date pattern. The results from the grep are saved to a file which are then fed into the .netrc configuration file.

The process is to copy from a clean template “.netrc_template,” to a file called “.netrc” (which is the standard config file for ncftp). Append the text from the “files_to_get” file. Then finish off the macro, by adding the command “quit“.

genreports calls awstats package

uploadstats simply calls curl to upload to the web server

A bit hacky and very simple but has done the job for a number of years.

Leave a Reply

Your email address will not be published. Required fields are marked *