amazon s3 log analysis AWS Batch File Command Line S3

Get Yesterday’s date in MS DOS Batch file

A while back while I was trying to figure out the best way to gather some log files from Amazon S3 buckets and some web servers I run.  These resources are currently generating around 10-15GB of uncompressed log files daily.  Besides being fairly large in size the S3 (and CloudFront) log files are numerous.  Any given bucket can easily generate 1,000 or more log files per day – that’s a whole other story. . .

Anyway, I wanted to be able to run a process sometime after midnight that would gather and zip the previous day’s files and stash the zipped files in another location for archival.  It’s pretty easy to calculate the previous day’s date if it’s in the middle of the month, but what if it’s the first of the month, first of the year, and what about leap year, etc., etc. . . ?  So I searched around the web a bit and came across a great solution to this issue on Experts Exchange (Get Yesterday date in MS DOS Batch file).  Thanks to SteveGTR for this one.
I have modified the original script a bit to suite my needs.  Most notably at the end of the script I create two variables, IISDT and AWSDT, to match IIS and Amazon Web Services (S3 and CloudFront) log formats, respectively.  I use this in a simple batch file which is executed like, “gather_log_files.bat 1.”  The number “1” is passed into the script which calculates the date of “1” day before the current date.  Of course you could pass any number in there to generate a date x days in the past.  It’s very slick. NOTE: If you don’t specify a number after the batch file “1” is assumed.

So, without further ado, here’s the script.

@echo off

set yyyy=

set $tok=1-3
for /f "tokens=1 delims=.:/-, " %%u in ('date /t') do set $d1=%%u
if "%$d1:~0,1%" GTR "9" set $tok=2-4
for /f "tokens=%$tok% delims=.:/-, " %%u in ('date /t') do (
for /f "skip=1 tokens=2-4 delims=/-,()." %%x in ('echo.^|date') do (
set %%x=%%u
set %%y=%%v
set %%z=%%w
set $d1=
set $tok=))

if "%yyyy%"=="" set yyyy=%yy%
if /I %yyyy% LSS 100 set /A yyyy=2000 + 1%yyyy% - 100

set CurDate=%mm%/%dd%/%yyyy%
set dayCnt=%1

if "%dayCnt%"=="" set dayCnt=1

REM Substract your days here
set /A dd=1%dd% - 100 - %dayCnt%
set /A mm=1%mm% - 100

if /I %dd% GTR 0 goto DONE
set /A mm=%mm% - 1
if /I %mm% GTR 0 goto ADJUSTDAY
set /A mm=12
set /A yyyy=%yyyy% - 1

if %mm%==1 goto SET31
if %mm%==2 goto LEAPCHK
if %mm%==3 goto SET31
if %mm%==4 goto SET30
if %mm%==5 goto SET31
if %mm%==6 goto SET30
if %mm%==7 goto SET31
if %mm%==8 goto SET31
if %mm%==9 goto SET30
if %mm%==10 goto SET31
if %mm%==11 goto SET30
REM ** Month 12 falls through

set /A dd=31 + %dd%

set /A dd=30 + %dd%

set /A tt=%yyyy% %% 4
if not %tt%==0 goto SET28
set /A tt=%yyyy% %% 100
if not %tt%==0 goto SET29
set /A tt=%yyyy% %% 400
if %tt%==0 goto SET29

set /A dd=28 + %dd%

set /A dd=29 + %dd%

if /I %mm% LSS 10 set mm=0%mm%
if /I %dd% LSS 10 set dd=0%dd%

REM Set IIS and AWS date variables
set IISDT=%yyyy:~2,2%%mm%%dd%
set AWSDT=%yyyy%-%mm%-%dd%

The results would look like:

IIS Date: 20100727

AWS Date: 2010-07-27

amazon s3 log analysis Amazon Web Services AWS CloudBerry CloudFront Linux S3 S3STAT Webalizer Windows

Web log analysis and statistics for Amazon S3 with S3STAT

I’ve been using Amazon Web Services for several months now. Like anything else I need to know what’s going on with my services – what’s being downloaded, how often, from where, etc. In the middle of last month I finally found a service which bridges the gap allowing a good view into what’s going on with my S3 buckets. S3STAT is a service that takes the detailed server access logs provided by Amazon’s Cloudfront and Simple Storage Service (S3), and translates them into human readable statistics, reports and graphs.

Every night they download my access logs, translate them, sort them, and run them through Webalizer, then they stick the processed log files right back into my Amazon S3 Bucket for me to view.

S3STAT provides the following benefits

  • Get Access to your Cloudfront and S3 Web Logs in a format that you can use. S3STAT will set it all up for you automatically.
  • Track your Cloudfront and S3 Usage Statistics through graphical reports generated on a nightly basis.
  • Identify performance bottlenecks caused by slow loading content. S3STAT keeps statistics on S3 processing time and system latency.
  • Consolidate your web usage reports by downloading nightly log files in Common Logfile Format and Combined Logfile Format.
  • Industry Standard web statistics provided by Webalizer, the leading web log analysis and reporting package.
  • They do all this for only $5 a month!
S3STAT provides two ways to process logs.  This first is to give them direct access to my S3 account, but being the paranoid admin that I am I didn’t like this idea.  They even acknowledge this may be an issue, “Don’t Trust Us?  If you really don’t want to hand over your S3 credentials, it is still possible to use S3STAT in self-managed mode.”  I opted for the self-managed mode although it’s a pain in the you-know-what to setup.

Enter CloudBerry Explorer for Amazon S3.  The good folks at CloudBerry Labs have integrated the S3STAT self-managed setup and configuration into CloudBerry Explorer.  In CloudBerry Explorer just right-click your S3 bucket you want to use with S3STAT and select properties (you can also get there by right-clicking on the bucket, select Logging, Log Settings).

On the CloudFront Logging tab choose Use S3Stat logging.  Click OK.

Next, logon to your S3STAT account (make sure you set it up to use self-managed mode).  From your main account page select Add an S3 bucket.  Enter your bucket name and click Verify.
Sit back, relax and wait a couple days for the stats to accumulate and to be processed by S3STAT.  Once you have some stats you can access them easily though links (for each bucket) from your S3STAT account page.
This will take you to your stats page, which is actually stored right in your analyzed bucket.

So far I’ve been fairly pleased with S3STAT, especially considering I haven’t paid a dime during the 30-day free trial.  However, I have noticed one issue – on a few of the days I have little to no stats, while I know I’ve had traffic.  Not sure if this is a bug with S3STAT or just what.  I’m not a huge fan of the Webalizer interface, but I can deal with it.  Otherwise S3STAT has been great and saved me a ton of time by not having to setup my own analytics for my S3 buckets.

One other small drawback – At the moment, there is not a way to configure Cloudfront distributions in self-managed mode.  According to S3STAT, “Cloudfront doesn’t yet allow you to change the ACL for delivered logfiles, which means we can’t read them unless we have your AWS credentials.  Never fear, though. We’re working with the Cloudfront team to make this possible.”

I definitely recommend giving it a try!