7-zip 7zip S3

7-zip (7zip) for Windows (7, 2008, etc.) Rocks!

I’ve been using 7-zip for a long time now.  It quickly became my favorite zip/unzip utility.  It’s quick, easy to use, opensource, and best of all, FREE.

A few months ago I worked on a project to gather and zip log files from numerous S3 buckets and archive them in another location.  Initially I specified the “zip” compression method and the files were stored with a .zip extension.  One day I was experimenting with the log files and realized that using “7zip” compression, while it took a little longer to complete, the files were considerably smaller.  In my case it doesn’t really matter how long the files take to compress, but I want them as small as possible since I’m using an S3 bucket for archival and want to spend as little as possible.

As an example take 30 files for June from one of my buckets.  The files were 5.6GB in total size as “.zip” files, but only 3.6GB as “.7z” files.  That’s about 35% less!  Granted, we are only talking about a few cents a month here, but as I said earlier I have several buckets and keep several months of logs from each bucket, so it adds up over time.  Definitely worth the additional processing time to use “7z” compression.

Take a look at a file-by-file comparison:

Amazon Web Services AWS Bucket Policy JSON S3

S3 Bucket Policy to Restrict Access by Referrer, Yet Allow Direct Access to File(s)

Recently Amazon rolled out S3 Bucket Policies (see Access Policy Language) to more finely control access to S3 buckets or resources in buckets, than with just ACL’s alone.  This was very timely as I had a need arise to use a bucket policy just after it came out.  Basically I needed to block access of a single file, let’s call it xyz.htm, from certain referrers, yet allow all others.  After a little research and some trial-and-error I was able to define a policy which did just this:


However, this had the undesired effect of blocking direct access to the file, i.e., where there is no referrer, or the referrer is null.  This one took me a little longer to figure out, and a key piece of it was found in the Amazon developer forums.  I was then able to write a bucket policy which behaves as desired:

"Sid":"1- Allow direct access to xyz.htm - i.e. no referrer.",
"Sid":"2- Allow all referrers to xyz.htm except those listed.",

This policy effectively allows direct access to xyz.htm (null or “no” referrer), and allows access to all referrers except those explicitly listed in the Sid:2 section.  One important note is that “public” read access must not be set in the ACL for this file as it will allow anyone access, effectively bypassing this policy.


  • Amazon S3 bucket policies use JSON.  If you aren’t familiar with JSON as I wasn’t you can read more here.
  • I found a handy JSON Formatter and Validator – to do just that. . .
  • Since Amazon doesn’t provide an easy method for us non-programmers to apply bucket policies I found CloudBerry S3 Bucket Explorer Pro essential and simple to use to apply bucket policies.
  • Sometimes as I applied a policy to test I would receive the message “invalid aspen elements,” which basically mean something is wrong, usually one of the required elements was either missing or incorrect, and, interestingly no results were found using Google.
See Also:
amazon s3 log analysis AWS Batch File Command Line S3

Get Yesterday’s date in MS DOS Batch file

A while back while I was trying to figure out the best way to gather some log files from Amazon S3 buckets and some web servers I run.  These resources are currently generating around 10-15GB of uncompressed log files daily.  Besides being fairly large in size the S3 (and CloudFront) log files are numerous.  Any given bucket can easily generate 1,000 or more log files per day – that’s a whole other story. . .

Anyway, I wanted to be able to run a process sometime after midnight that would gather and zip the previous day’s files and stash the zipped files in another location for archival.  It’s pretty easy to calculate the previous day’s date if it’s in the middle of the month, but what if it’s the first of the month, first of the year, and what about leap year, etc., etc. . . ?  So I searched around the web a bit and came across a great solution to this issue on Experts Exchange (Get Yesterday date in MS DOS Batch file).  Thanks to SteveGTR for this one.
I have modified the original script a bit to suite my needs.  Most notably at the end of the script I create two variables, IISDT and AWSDT, to match IIS and Amazon Web Services (S3 and CloudFront) log formats, respectively.  I use this in a simple batch file which is executed like, “gather_log_files.bat 1.”  The number “1” is passed into the script which calculates the date of “1” day before the current date.  Of course you could pass any number in there to generate a date x days in the past.  It’s very slick. NOTE: If you don’t specify a number after the batch file “1” is assumed.

So, without further ado, here’s the script.

@echo off

set yyyy=

set $tok=1-3
for /f "tokens=1 delims=.:/-, " %%u in ('date /t') do set $d1=%%u
if "%$d1:~0,1%" GTR "9" set $tok=2-4
for /f "tokens=%$tok% delims=.:/-, " %%u in ('date /t') do (
for /f "skip=1 tokens=2-4 delims=/-,()." %%x in ('echo.^|date') do (
set %%x=%%u
set %%y=%%v
set %%z=%%w
set $d1=
set $tok=))

if "%yyyy%"=="" set yyyy=%yy%
if /I %yyyy% LSS 100 set /A yyyy=2000 + 1%yyyy% - 100

set CurDate=%mm%/%dd%/%yyyy%
set dayCnt=%1

if "%dayCnt%"=="" set dayCnt=1

REM Substract your days here
set /A dd=1%dd% - 100 - %dayCnt%
set /A mm=1%mm% - 100

if /I %dd% GTR 0 goto DONE
set /A mm=%mm% - 1
if /I %mm% GTR 0 goto ADJUSTDAY
set /A mm=12
set /A yyyy=%yyyy% - 1

if %mm%==1 goto SET31
if %mm%==2 goto LEAPCHK
if %mm%==3 goto SET31
if %mm%==4 goto SET30
if %mm%==5 goto SET31
if %mm%==6 goto SET30
if %mm%==7 goto SET31
if %mm%==8 goto SET31
if %mm%==9 goto SET30
if %mm%==10 goto SET31
if %mm%==11 goto SET30
REM ** Month 12 falls through

set /A dd=31 + %dd%

set /A dd=30 + %dd%

set /A tt=%yyyy% %% 4
if not %tt%==0 goto SET28
set /A tt=%yyyy% %% 100
if not %tt%==0 goto SET29
set /A tt=%yyyy% %% 400
if %tt%==0 goto SET29

set /A dd=28 + %dd%

set /A dd=29 + %dd%

if /I %mm% LSS 10 set mm=0%mm%
if /I %dd% LSS 10 set dd=0%dd%

REM Set IIS and AWS date variables
set IISDT=%yyyy:~2,2%%mm%%dd%
set AWSDT=%yyyy%-%mm%-%dd%

The results would look like:

IIS Date: 20100727

AWS Date: 2010-07-27

Amazon Web Services CloudBerry EC2 S3 S3.exe

Amazon S3 Command Line Utilities for Windows

I’ve searched high and low for a good all around command line utility to interact with Amazon S3 buckets from Windows.  While I’m still searching for just the right utility for me here are a few which I use from time-to-time.   Why use more than one, you ask?  Well, since I haven’t found just the right one for all occasions I use the one that works best for the particular task at hand.

S3.exe is a Windows command-line utility for Amazon’s S3 & EC2 web services that requires no installation, is a single .EXE file with no DLLs, and requires only .NET 2.0 or Mono, so it will work on a plain Windows installation.

Key Features

  • Efficiently uploads and downloads large numbers of files (or whole directories) between Amazon S3 and Windows PCs.
  • Everything is in one .EXE. Nothing to install or configure, just download it where it’s needed and run it.
  • Doesn’t require anything except .NET 2.0 or Mono.
  • Works well in an automated backup solution or as an ad-hoc system administration tool.
  • Can split large files into chunks for upload without creating any temporary files on disk.
  • Can use HTTP HEAD command to quickly determine which files don’t need to be uploaded because they haven’t been updated (/sync).
  • Support for various EC2 operations as well.

CloudBerry Explorer PowerShell Snap-in
CloudBerry Explorer offers PowerShell extension to manage file operations across Amazon Simple Storage Service (Amazon S3) and file system.  The CloudBerry Explorer PowerShell Snap-in allows using the majority of Amazon S3 functionality. You can combine CloudBerry Explorer commands with PowerShell commands. PowerShell is designed to operate with Net objects, so you are not limited with command syntax. You can write complicated scripts with loops and conditions. You can schedule periodical tasks like data backup or cleanup.

#Sh3ll (Amazon S3 command shell for C#)
#Sh3ll (pronounced sharp-shell) is a C# based command shell for managing your Amazon S3 objects.  It is open source and provided by SilvaSoft (click to download #sh3ll and for more information). #Sh3ll is built upon the Amazon S3 REST C# library, and it runs on both .NET 1.1 and .NET 2.0.

Also from SilvaSoft:

  • Sh3ll – Amazon S3 command shell for Java
  • rSh3ll – Amazon S3 command shell for Ruby
amazon s3 log analysis Amazon Web Services AWS CloudBerry CloudFront Linux S3 S3STAT Webalizer Windows

Web log analysis and statistics for Amazon S3 with S3STAT

I’ve been using Amazon Web Services for several months now. Like anything else I need to know what’s going on with my services – what’s being downloaded, how often, from where, etc. In the middle of last month I finally found a service which bridges the gap allowing a good view into what’s going on with my S3 buckets. S3STAT is a service that takes the detailed server access logs provided by Amazon’s Cloudfront and Simple Storage Service (S3), and translates them into human readable statistics, reports and graphs.

Every night they download my access logs, translate them, sort them, and run them through Webalizer, then they stick the processed log files right back into my Amazon S3 Bucket for me to view.

S3STAT provides the following benefits

  • Get Access to your Cloudfront and S3 Web Logs in a format that you can use. S3STAT will set it all up for you automatically.
  • Track your Cloudfront and S3 Usage Statistics through graphical reports generated on a nightly basis.
  • Identify performance bottlenecks caused by slow loading content. S3STAT keeps statistics on S3 processing time and system latency.
  • Consolidate your web usage reports by downloading nightly log files in Common Logfile Format and Combined Logfile Format.
  • Industry Standard web statistics provided by Webalizer, the leading web log analysis and reporting package.
  • They do all this for only $5 a month!
S3STAT provides two ways to process logs.  This first is to give them direct access to my S3 account, but being the paranoid admin that I am I didn’t like this idea.  They even acknowledge this may be an issue, “Don’t Trust Us?  If you really don’t want to hand over your S3 credentials, it is still possible to use S3STAT in self-managed mode.”  I opted for the self-managed mode although it’s a pain in the you-know-what to setup.

Enter CloudBerry Explorer for Amazon S3.  The good folks at CloudBerry Labs have integrated the S3STAT self-managed setup and configuration into CloudBerry Explorer.  In CloudBerry Explorer just right-click your S3 bucket you want to use with S3STAT and select properties (you can also get there by right-clicking on the bucket, select Logging, Log Settings).

On the CloudFront Logging tab choose Use S3Stat logging.  Click OK.

Next, logon to your S3STAT account (make sure you set it up to use self-managed mode).  From your main account page select Add an S3 bucket.  Enter your bucket name and click Verify.
Sit back, relax and wait a couple days for the stats to accumulate and to be processed by S3STAT.  Once you have some stats you can access them easily though links (for each bucket) from your S3STAT account page.
This will take you to your stats page, which is actually stored right in your analyzed bucket.

So far I’ve been fairly pleased with S3STAT, especially considering I haven’t paid a dime during the 30-day free trial.  However, I have noticed one issue – on a few of the days I have little to no stats, while I know I’ve had traffic.  Not sure if this is a bug with S3STAT or just what.  I’m not a huge fan of the Webalizer interface, but I can deal with it.  Otherwise S3STAT has been great and saved me a ton of time by not having to setup my own analytics for my S3 buckets.

One other small drawback – At the moment, there is not a way to configure Cloudfront distributions in self-managed mode.  According to S3STAT, “Cloudfront doesn’t yet allow you to change the ACL for delivered logfiles, which means we can’t read them unless we have your AWS credentials.  Never fear, though. We’re working with the Cloudfront team to make this possible.”

I definitely recommend giving it a try!

Amazon Web Services AWS AWStats Linux S3 Windows

Analyzing Amazon S3 Logs with AWStats

The Amazon Simple Storage Service (Amazon S3) provides virtually limitless storage accessible over the Internet. Along with this functionality, however, comes the need to understand how the data stored on Amazon S3 is being used. Amazon S3 supports logging each request made to a given Amazon S3 bucket to a text file. Out of the box, there is no built-in functionality to process these logs, but AWStats—a free log file parsing and analysis package—provides a solution. This article shows how to configure AWStats to process and display Amazon S3 access logs. When you’ve finished reading, you will be able to graphically analyze Amazon S3 access log data.

Entire article.

Amazon Web Services AWS CLI Command Line EC2 Encryption Linux S3 SSL Windows

Glossary of Amazon EC2 terms

Amazon machine image (AMI)
An Amazon Machine Image (AMI) is an encrypted machine image stored in Amazon S3. It contains all the information necessary to boot instances of your software.

Amazon EBS
A type of storage that enables you to create volumes that can be mounted as devices by Amazon EC2 instances. Amazon EBS volumes behave like raw unformatted external block devices. They have user supplied device names and provide a block device interface. You can load a file system on top of Amazon EBS volumes, or use them just as you would use a block device.

Availability Zone
A distinct location within a region that is engineered to be insulated from failures in other Availability Zones and provides inexpensive, low latency network connectivity to other Availability Zones in the same region.

compute unit
An Amazon-generated measure that enables you to evaluate the CPU capacity of different Amazon EC2 instance types.

See Amazon EBS.

Elastic Block Store
See Amazon EBS.

elastic IP address
A static public IP address designed for dynamic cloud computing. Elastic IP addresses are associated with your account, not specific instances. Any elastic IP addresses that you associate with your account remain associated with your account until you explicitly release them. Unlike traditional static IP addresses, however, elastic IP addresses allow you to mask instance or Availability Zone failures by rapidly remapping your public IP addresses to any instance in your account.

ephemeral store
See instance store.

explicit launch permission
Launch permission granted to a specific user.

See security group.

instance store
Every instance includes a fixed amount of storage space on which you can store data. This is not designed to be a permanent storage solution. If you need a permanent storage system, use Amazon EBS.

instance type
A specification that defines the memory, CPU, storage capacity, and hourly cost for an instance. Some instance types are designed for standard applications while others are designed for CPU-intensive applications.

gibibyte (GiB)
a contraction of giga binary byte, a gibibyte is 2^30 bytes or 1,073,741,824 bytes. A gigabyte is 10^9 or 1,000,000,000 bytes. So yes, Amazon has bigger bytes.

See Amazon machine image.

Once an AMI has been launched, the resulting running system is referred to as an instance. All instances based on the same AMI start out identical and any information on them is lost when the instances are terminated or fail.

instance store
The disk storage associated with an instance. In the event an instance fails or is terminated (not simply rebooted), all content on the instance store is deleted.

Also known as a security group, groups define firewall rules that can be shared among a group of instances that have similar security requirements. The group is specified at instance launch.

launch permission
AMI attribute allowing users to launch an AMI

Amazon EC2 instances are available for many operating platforms, including Linux, Solaris, Windows, and others.

paid AMI
An AMI that you sell to other Amazon EC2 users. For more information, refer to the Amazon DevPay Developer Guide.

private IP address
All Amazon EC2 instances are assigned two IP addresses at launch: a private address (RFC 1918) and a public address that are directly mapped to each other through Network Address Translation (NAT).

public AMI
An AMI that all users have launch permissions for.

public data sets
Sets of large public data sets that can be seamlessly integrated into AWS cloud-based applications. Amazon stores the data sets at no charge to the community and, like all AWS services, users pay only for the compute and storage they use for their own applications. These data sets currently include data from the Human Genome Project, the U.S. Census, Wikipedia, and other sources.

public IP address
All Amazon EC2 instances are assigned two IP addresses at launch: a private address (RFC 1918) and a public address that are directly mapped to each other through Network Address Translation (NAT).

A geographical area in which you can launch instances (e.g., US, EU).

A collection of instances started as part of the same launch request.

Reserved Instance
An additional Amazon EC2 pricing option. With Reserved Instances, you can make a low one-time payment for each instance to reserve and receive a significant discount on the hourly usage charge for that instance.

security group
A security group is a named collection of access rules. These access rules specify which ingress (i.e., incoming) network traffic should be delivered to your instance. All other ingress traffic will be discarded.

shared AMI
AMIs that developers build and make available for other AWS developers to use.

Amazon EC2 instances are available for many operating platforms, including Linux, Solaris, Windows, and others.

Amazon EBS provides the ability to create snapshots or backups of your Amazon EBS volumes and store them in Amazon S3. You can use these snapshots as the starting point for new Amazon EBS volumes and to protect your data for long term durability.

supported AMIs
These AMIs are similar to paid AMIs, except that you charge for software or a service that customers use with their own AMIs.

tebibyte (TiB)
a contraction of tera binary byte, a tebibyte is 2^40 bytes or 1,099,511,627,776 bytes. A terabyte is 10^12 or 1,000,000,000,000 bytes. So yes, Amazon has bigger bytes.

Amazon EC2 instances are available for many operating platforms, including Linux, Solaris, Windows, and others.

Amazon EC2 instances are available for many operating platforms, including Linux, Solaris, Windows, and others.

Amazon Web Services CLI Command Line EC2 Linux S3 Windows

Amazon Elastic Compute Cloud (EC2) Command Line Tools Reference

The Amazon Elastic Compute Cloud Command Line Tools Reference Guide provides the syntax, a description, options, and usage examples for each command line tool. This section describes who should read this guide, how the guide is organized, and other resources related to Amazon Elastic Compute Cloud.

The Amazon Elastic Compute Cloud is occasionally referred to within this guide as simply “Amazon EC2”; all copyrights and legal protections still apply.

View guide here.

Amazon Elastic Compute Cloud
Command Line Tools Reference (straight to the meat and potatoes)

Amazon Web Services CLI Command Line EC2 Linux S3 Windows

How to run Bucket Commander: A command line interface for Amazon S3

Bucket commander is a command line tool for Amazon S3.

Bucket Commander needs a configuration file, which can be created using Bucket Explorer’s UI.

Bucket Commander takes three arguments. ” -action ” , ” -authenticate ” and ” -emailprofile ”

“ -emailprofile” is optional argument , you need to specify it only when you have configured the Email profile for getting report of Bucket Commander operations (Upload, Download and Copy) via Email.
Valid values for ” -action ” are:

  • upload
  • download
  • copy

To run Bucket Commander at least one credential should be saved.

In case of Single credential saved the authentication argument is optional.

For ” -authenticate ” specify the nick name that you see in “quick connect” drop down from Bucket Explorer’s UI.
For Bucket Commander to work it needs config folder and .Lic file, i.e. bucketcommander.xml and bucketexplorer.xml . Upload /Download/Copy details are picked from the commander xml and authentication details are picked from bucketexplorer xml.

If BucketCommander.exe runs on different machines then it will not be able to decrypt credentials so it will prompt to update credentials, now you need to update credentials by giving Access Key and Secret Key .

For ” -emailprofile ” specify the profile name that you have saved in Email profile configuration from Bucket Explorer’s UI.

How to send report with Bucket Commander

You can specify more than one Email Profile by comma separator for getting report of Bucket Commander operations via email to each specified profile.
An example of working command looks like:
Command on Windows

Bucketcommander.exe -action:upload/download/copy [-authenticate:nick-name][[-emailprofile:profilename1,profilename2]

Command on Linux -action:upload/download/copy [-authenticate:nick-name][-emailprofile:profilename1,profilename2]

Note: On Linux you can open terminal from Applications->Accessories->Terminal in Finder
Command on Mac OSX

java -jar BucketExplorer.jar -action:upload/download/copy [-authenticate:nick-name][-emailprofile:profilename1,profilename2]

Note: On Mac OSX you can open terminal from Applications->Utilities->Terminal in Finder.

Download bucket explorer for windows, linux and mac osx

Amazon Web Services AWS EC2 Encryption FTP Linux Passwords PuTTY S3 SSH Windows

A quick overview of PuTTY and SSH for AWS Newbies

Linux Access with SSH & PuTTY

This post will (attempt) to explain what SSH and PuTTY are so that as a user you understand the terminology of AWS and so that you can be productive in the environment. This post will not attempt to make you an expert in SSH. For best practices in implementing SSH, I strongly recommend a book dedicated to hardening *nix (Linux, Unix, Solaris, etc).


In the early days, not that long ago really, of networking, very simple tools were used to work with remote computers: telnet as a console, ftp for file copying, rsh for remote command execution and others. These were easy to configure and use tools. They were client server in that a software component needed to run on both the local machine (client) and the remote machine (server).

While easy to use, they were very insecure. They made no pretense at verifying that the calling host really was the calling host. Everything was username/password based and both the username and the password were passed around the network in cleartext. If you intercepted the little data packages that were being routed around the network (with a sniffer for example), you would be able to extract the login credentials. Even if you encrypted all of your data, your credentials were still in the clear.

SSH is an attempt (quite successful) to fix those insecurities without making things anymore complex than they need to be. SSH stands for Secure SHell. However, SSH is not really a command shell, it is rather a protocol that encrypts communications. That means that programs that use SSH can work like telnet or ftp but will be more secure.

Note: Technically, SSH is also a tool. There is a client terminal program called SSH. It’s a non-graphical command line tool that provides a window which executes a command shell on the remote system.

SSH offers multiple modes of connecting but for the purposes of AWS, we will talk about key based access. To make things more secure, EC2 uses a key based authentication. Before starting an instance, you need to create a key pair.

Note: The below explanation of SSH is a gross over simplification. I am just trying to give you a feel for what is going on. If you really want to understand the technical details, I really do recommend that you purchase a book. My personal recommendation is SSH, The Secure Shell: The Definitive Guide from O’Reilly.

When an instance starts up for the first time, EC2 copies the ssh key that you created to the proper directory on the remote server. The remote server will be running the SSH Server software.

You will then use an SSH client to connect to the server. The client will ask for some information proving that the server really is who it says it is. The first time you connect to a server, the client won’t have that information available so it will prompt you to vertify that the server is legitimate.

You verify that information by comparing a thumbprint. Verifying a host is a bit beyond this book but do an internet search for for “ssh host thumbprint”. You’ll find a variety of articles explaining it in detail.

Once the client accepts the host, the client will send secret information to the host. This is your key data. If the host is able to make a match, it will authenticate you and let you login in. If the host then asks for a password, you key did not work and something is not configured properly. In my experience, it will probably be that your client key file is not in the place your client is expecting it to be.

What happens next depends on the tool you are using. If you are using a terminal program, ssh for example, you will now have a command prompt. If you are using sftp or scp, you will be able to copy files.

In addition to command line tools, there are GUI tools that use the SSH protocol. WinSCP is an excellent SCP client for Windows.

Regardless of the tools you use, SSH is busy encrypting everything you send over the wire. The SSH protocol has evolved over the years, and will probably evolve even more in the future, but it is currently running a very secure form of encryption.

If you are running Linux, you are pretty much finished at this point. SSH ships with every Linux distribution that I am aware of. If you are using Windows, however, you either need to install CyWin (a unix environment that runs in windows), or you’ll want to get PuTTY.

You can download all of the programs discussed in this section at:

I honestly have no idea why PuTTY is spelled PuTTY. I can figure the TTY part of it is from the Unix command that output a display. I’m not sure bout the Pu though.

I do know what PuTTY is though. PuTTY is a very simple implementation of an MS-Windows SSH terminal client. When I say it is simple, I mean that as a complement. This is a tool that does not get in the way.

You tell PuTTY to connect to a remote server and, as long as your keys are configured, it will connect you. If are not using keys, you can connect with passwords (if the host allows that). As a best practice, keys are recommends over passwords.

PuTTY is the terminal client but you can get a couple of other tools from the same author. PSFTP and PSCP offer secure file transfers. These tools are as easy to use as PuTTY and work pretty much the same way.

For command line syntax and configuration, take a look at the documentation at the link above.

A note about SSH keys and PuTTY, they are not compatible. This same web site offers a utility called PuTTYgen. When you create a key pair for EC2, you download that file to your local machine. PuTTYgen converts that file (a .pem file) to a private key file (a .ppk file).

PuTTY Key Generator

PuTTY Key Generator

The tool is named puttygen.exe. Run the executable and the above window pops up. To convert an amazon key to a PuTTY key, use the menu option Conversions ? Import Key. Load the .pem file that you downloaded and press the Save Private Key button.

It will warn you about leaving the passphrase blank. That’s ok.

Save the file to the location that PuTTY has been configured to look in for it’s keys.