{"id":420,"date":"2013-09-28T19:05:58","date_gmt":"2013-09-28T09:05:58","guid":{"rendered":"http:\/\/www.alphastar.net.au\/blog\/?p=420"},"modified":"2017-11-01T09:52:52","modified_gmt":"2017-11-01T09:52:52","slug":"processing-web-logs","status":"publish","type":"post","link":"https:\/\/alphastar.net.au\/weblog\/2013\/09\/28\/processing-web-logs\/","title":{"rendered":"processing web logs"},"content":{"rendered":"<p>For a church website I maintain I needed to find a way to produce graphical log data from raw log files. The web host it currently sits on does not have awstats or cPanel available, so I devised a way to pull down the logs via ftp, process them locally on a Ubuntu box and post them back up to the web host.<\/p>\n<p><span style=\"text-decoration: underline;\">Tools used<\/span><br \/>\nBash script<br \/>\nncftp<br \/>\ngrep<\/p>\n<p><!--more--><\/p>\n<p>I need to determine what the files are to retrieve and then retrieve them. To do this, ncftp has a nice feature that allows macros to be run when invoked.<\/p>\n<p>Edit $HOME\/.netrc_template and enter the following:<\/p>\n<p>[code]<\/p>\n<p>machine xxx.xxx.xxx.xxx<br \/>\nlogin yyyyyyyy<br \/>\npassword zzzzzzzz<\/p>\n<p>macdef displaylistbydate<br \/>\ncd \/logs<br \/>\nbin<br \/>\nls -l<br \/>\nquit<\/p>\n<p>macdef getlogfile<br \/>\ncd \/logs<br \/>\nbin<br \/>\n<span style=\"text-decoration: line-through;\"><em> <span style=\"color: #99cc00;\">get 2013-09-35-web.log<br \/>\nget 2013-09-36-web.log<br \/>\nget 2013-09-37-web.log<\/span><br \/>\n<\/em><span style=\"color: #ff0000;\">quit<\/span><\/span><br \/>\n[\/code]<\/p>\n<p>This does what I need. But how to programatically determine what the files to download are? The filenames will update when the logs rollover so the filenames need to be recalculated.<\/p>\n<p>For this, an additional bash script to query the server and retrieve those files that are relevant.<\/p>\n<p>The main shell script is as follows<\/p>\n<p>[code]<\/p>\n<p>#!\/bin\/bash<\/p>\n<p># This part gets the applicable log files from the ftp server<br \/>\necho &#8220;\\$ displaylistbydate&#8221; | ftp -v xxx.xxx.xxx.xxx &gt; $HOME\/sydney_logs\/all_log_files<br \/>\ngrep -v web.log.bz2 $HOME\/sydney_logs\/all_log_files | grep -o 2[0-9][0-9][0-9]-.*log &gt;$HOME\/sydney_logs\/files_to_get<\/p>\n<p>#This part dynamically modifies the .netrc file to pull the above files<br \/>\nFILE=$HOME\/sydney_logs\/files_to_get<br \/>\nif [ ! -f $FILE ]; then<br \/>\necho &#8220;$FILE : does not exists&#8221;<br \/>\nexit 1<br \/>\nelif [ ! -r $FILE ]; then<br \/>\necho &#8220;$FILE: can not read&#8221;<br \/>\nexit 2<br \/>\nfi<\/p>\n<p>exec 3&lt;&amp;0<br \/>\nexec 0&lt;$FILE<br \/>\ncp $HOME\/.netrc_template $HOME\/.netrc<br \/>\nwhile read line<br \/>\ndo<br \/>\n# use $line variable to process line in processLine() function<br \/>\necho &#8220;\u00c2\u00a0\u00c2\u00a0\u00c2\u00a0\u00c2\u00a0\u00c2\u00a0\u00c2\u00a0\u00c2\u00a0 get $line&#8221; &gt;&gt; $HOME\/.netrc<br \/>\ndone<\/p>\n<p>echo &#8220;\u00c2\u00a0\u00c2\u00a0\u00c2\u00a0\u00c2\u00a0\u00c2\u00a0\u00c2\u00a0\u00c2\u00a0 quit&#8221; &gt;&gt; $HOME\/.netrc<br \/>\necho &#8220;&#8221; &gt;&gt; $HOME\/.netrc<\/p>\n<p># acutally retrieve the files<br \/>\ncd $HOME\/sydney_logs<br \/>\necho &#8220;\\$ getlogfile&#8221; | ftp -v xxx.xxx.xxx.xxx<\/p>\n<p># copy the log file and let awstats process the file<br \/>\nexec 3&lt;&amp;0<br \/>\nexec 0&lt;$FILE<br \/>\nwhile read line<br \/>\ndo<br \/>\necho &#8220;copying $line&#8221;<br \/>\ncp $line $HOME\/awstats\/mylog.log<br \/>\nperl $HOME\/awstats\/update<br \/>\ndone<br \/>\ncd $HOME\/awstats<br \/>\necho &#8220;Generating a report (may take a few mins)&#8221;<br \/>\n$HOME\/awstats\/genreports<br \/>\ncd $HOME\/sydney_logs<\/p>\n<p>#upload the report to the server<br \/>\n.\/uploadstats<\/p>\n<p>[\/code]<\/p>\n<p>The grep command looks for all files that aren&#8217;t archived bz2 files and follow a certain date pattern. The results from the grep are saved to a file which are then fed into the .netrc configuration file.<\/p>\n<p>The process is to copy from a clean template &#8220;.netrc_template,&#8221; to a file called &#8220;.netrc&#8221; (which is the standard config file for ncftp). Append the text from the &#8220;<span style=\"color: #99cc00;\">files_to_get&#8221; <\/span>file. Then finish off the macro, by adding the command &#8220;<span style=\"color: #ff0000;\">quit<\/span>&#8220;.<\/p>\n<p><span style=\"font-family: courier;\">genreports<\/span> calls awstats package<\/p>\n<p><span style=\"font-family: courier;\">uploadstats<\/span> simply calls curl to upload to the web server<\/p>\n<p>A bit hacky and very simple but has done the job for a number of years.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>For a church website I maintain I needed to find a way to produce graphical log data from raw log files. The web host it currently sits on does not&#8230; <\/p>\n","protected":false},"author":1,"featured_media":769,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3],"tags":[],"class_list":["post-420","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-coding"],"_links":{"self":[{"href":"https:\/\/alphastar.net.au\/weblog\/wp-json\/wp\/v2\/posts\/420","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/alphastar.net.au\/weblog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/alphastar.net.au\/weblog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/alphastar.net.au\/weblog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/alphastar.net.au\/weblog\/wp-json\/wp\/v2\/comments?post=420"}],"version-history":[{"count":28,"href":"https:\/\/alphastar.net.au\/weblog\/wp-json\/wp\/v2\/posts\/420\/revisions"}],"predecessor-version":[{"id":765,"href":"https:\/\/alphastar.net.au\/weblog\/wp-json\/wp\/v2\/posts\/420\/revisions\/765"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/alphastar.net.au\/weblog\/wp-json\/wp\/v2\/media\/769"}],"wp:attachment":[{"href":"https:\/\/alphastar.net.au\/weblog\/wp-json\/wp\/v2\/media?parent=420"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/alphastar.net.au\/weblog\/wp-json\/wp\/v2\/categories?post=420"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/alphastar.net.au\/weblog\/wp-json\/wp\/v2\/tags?post=420"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}