Dan Collins
Published on

AWS Simple Storage Service (S3) Part 1

Authors
  • avatar
    Name
    Dan Collins
    Twitter

Table of Contents

  1. Terminology
  2. Storage Classes
  3. S3 Bucket Features
  4. S3 Batch Operations
  5. S3 Access Points
  6. S3 Select
  7. S3 Access Control (Block Public Access)
  8. S3 Common Limits

S3 is a flat object store, S3 is not a 'filesystem'.

In object storage, each object consists of data, metadata, and a key.

Reading Resources:

Terminology

Buckets

  • Identify where an object belongs
  • Objects are store in buckets in lexicographical order

Object

  • Logical entities or containers with information used to store data on S3 (like files)
  • Can contain any data, usually bytes in a specific order

Prefix

  • Name or characters before an object name
  • Ends with a forward slash (/)
  • Example: bucketName/<prefix>/objectName.jpg

Key

  • Unique name and path to an object in S3
  • Excludes the bucket name

Value

  • Literal character that correspond to a key
  • Example: <key> : <value>

Version ID

  • When enables, allows multiple copies of an object using the same object name, each with a unique version ID

Metadata

  • Key-Value pairs that describe an object
  • Standard HTTP metadata includes object content type
  • Custom metadata can be specified

Access Control

  • Originally used before Identity Access Management (IAM) for controlling access to objects

Storage Classes

Amazon S3 offers a range of storage classes designed for different use cases.

Reading Resources:

Storage Types:

  • S3 Standard (The default, good for frequent access)
  • S3 Intelligent-Tiering
  • S3 Standard-Infrequent Access
  • S3 One Zone-Infrequent Access
  • S3 Glacier
  • S3 Glacier Deep Archive (Cheapest option, retrieval can take up to 12 hours)

S3 Bucket Features

S3 Bucket Properties Tab

Versioning

Reading Resource: Using Versioning

Once versioning is enabled, every new object get a version ID.

Server Access Logging

Provides detailed records of requests for a bucket.

Reading Resource: Logging requests using server access logging

Static Web Hosting

Reading Resource: Static Web Hosting

Web pages contain static content and some client-side scripts.

Object-level Logging

Reading Resource: CloudTrail Event Logging

This allows AWS CloudTrail to log data events for objects in an S3 bucket.

Default Encryption

Reading Resource: Enabling Encryption

Automatically encrypt new objects with selected encryption type. Options of S3 server-side encryption, AWS managed encryption key, or AWS managed encryption key. The default encrypts the customers data at rest.

Advanced Settings

Includes:

  • Object Lock, locking an object until a specific date.
  • Tags, track the storage cost or other criteria for individual projects.
  • Transfer Acceleration
  • Events, trigger actions based on an event, like a lambda function
  • Requester Pays, requestor pays cost of retrieving an object

S3 Bucket Management Tab

S3 Batch Operations

Reading Resource: Performing large-scale batch operations on Amazon S3 objects

You can use S3 Batch Operations to perform large-scale batch operations on Amazon S3 objects. S3 Batch Operations can perform a single operation on lists of Amazon S3 objects that you specify. A single job can perform a specified operation on billions of objects containing exabytes of data. Amazon S3 tracks progress, sends notifications, and stores a detailed completion report of all actions, providing a fully managed, auditable, and serverless experience. You can use S3 Batch Operations through the AWS Management Console, AWS CLI, AWS SDKs, or REST API.

S3 Batch Operations can set a job to execute batch operations on a list of S3 objects contained in a manifest object.

S3 Access Points

  • Simplifies access control for large, shared, or multi-tenant buckets.
  • Allow applications or users to interact with a multi-tenant bucket via a dedicated access point and with custom permissions.

Reading Resource: Managing data access with Amazon S3 access points

Amazon S3 access points simplify managing data access at scale for shared datasets in S3. Access points are named network endpoints that are attached to buckets that you can use to perform S3 object operations, such as GetObject and PutObject. Each access point has distinct permissions and network controls that S3 applies for any request that is made through that access point. Each access point enforces a customized access point policy that works in conjunction with the bucket policy that is attached to the underlying bucket.

S3 Select

  • The S3 Select feature can run SQL queries in place against CSV and JSON objects.

Reading Resource: Filtering and retrieving data using Amazon S3 Select

With Amazon S3 Select, you can use simple structured query language (SQL) statements to filter the contents of an Amazon S3 object and retrieve just the subset of data that you need. By using Amazon S3 Select to filter this data, you can reduce the amount of data that Amazon S3 transfers, which reduces the cost and latency to retrieve this data.

S3 Select Requirements and Limits

  • You must have s3:GetObject permission for the object you are querying.
  • You must use https and provide the encryption key when the object you are querying is encrypted (SSE-C).
  • Max length of a SQL expression is 256kb.
  • S3 Select can only emit nested data using the JSON output format.

S3 Select Errors

Amazon S3 Select returns an error code and an error message when an issue is encountered.

Reading Resource: List of SELECT Object Content Error Codes

S3 Select SQL Reference

Reading Resource: SQL reference for Amazon S3 Select and S3 Glacier Select

S3 Access Control (Block Public Access)

Public Access is granted through access control lists (ACLs), bucket policies, access point policies, or a combination.

Reading Resource: Blocking public access to your Amazon S3 storage

With S3 Block Public Access, account administrators and bucket owners can easily set up centralized controls to limit public access to their Amazon S3 resources that are enforced regardless of how the resources are created.

S3 Common Limits

  • Hard limit of 1,000 buckets per account
  • No limit to the amount of data/objects in a bucket
  • Hard limit of a 100 event notifications and 1,000 lifecycle rules per bucket
  • Bucket policies are limited to 20KB
  • Maximum object size is 5TB
  • Maximum HTTP put request size is 5GB
  • Maximum size for a multi-part upload is 5GB
  • Maximum number of parts for a multi-part upload is 10,000

Reading Resource: Best practices design patterns: optimizing Amazon S3 performance

Your applications can easily achieve thousands of transactions per second in request performance when uploading and retrieving storage from Amazon S3. Amazon S3 automatically scales to high request rates. For example, your application can achieve at least 3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD requests per second per prefix in a bucket. There are no limits to the number of prefixes in a bucket. You can increase your read or write performance by parallelizing reads.