Amazon S3 in a nutshell [AWS Solutions Architect Associate Exam]

Amazon S3 service is probably the most commonly used service in the AWS cloud. Today I’m going to go through the basic information you’ll need to know about Amazon S3.

The format of the AWS Solutions Architect Associate Exam posts series is based on the notes I made during my preparation for the Solutions Associate exam.

The goal is to keep it simple in the form of bullet-points, easy for the quick review.

I hope you’ll find it valuable.

Basics

Amazon S3 stands for Amazon Simple Storage Service.

It’s a safe pace to save your storage
Object-based storage
Data is spread across multiple devices and facilities
Good for hosting files/static websites but not suitable for installing OS or DB on

Files / Objects

S3 allows storing files from 0 bytes to 5 terabytes
Unlimited storage
Files are stored in resources called buckets
S3 has a universal namespace -> names must be unique globally
The response code for the successful upload to S3 should be 200 OK
S3 is key-value object-based storage:
- The key -> the name of the object.
- The value -> is the actual data.
Allows versioning for files and providing additional data about the object – metadata
Object subresources:
- Access Control Lists (list with permissions)
- Torrent (BitTorrent protocol support)

Data consistency model in S3

Read after Write consistency -> for PUT requests with new objects
Eventual consistency -> for overwrite PUT and DELETE requests
- It can take some time to propagate

Guarantees

Durability: 99,999999999% (eleven nines)
Availability: 99,99% (One Zone IA has 99,95%)

Features

Tiered storage
- Amazon S3 Storage Classes
Lifecycle management
- enables moving object between tiers
Versioning
- all versions of the object are being stored when enabled
Encryption
- Encryption in Transit
- Encryption at Rest
Objects protection with MFA (Multi-Factor Authentication) delete
Secure data using Access Control List and Bucket Policies

Storage Classes (tiers)

S3 Standard
- 99,99% availability
- 99,999999999% durability
- stored redundantly across multiple devices
- designed to sustain the loss of 2 facilities concurrently
S3 – IA (Infrequent Access)
- infrequently accessed
- good choice for data that are accessed less frequently, but requires rapid access when you need it
- lower fee than S3 Standard but you are charged a retrieval fee
S3 One Zone – IA
- 99,5% availability
- a lower-cost option for IA data
- don’t require the multiple Availability Zone data resilience model
S3 Intelligent Tiering
- designed for cost optimization by moving data to the most cost-effective access tier automatically (without performance nor the operational impact)
S3 Glacier
- for data archiving
- low-cost
- unlimited data storage
- configurable retrieval time (from mins to hours)
S3 Glacier Deep Archive
- lowest cost storage
- retrieval time of 12h

Security

Encryption on Transit:
- SSL / TLS
Encryption at Rest:
- S3-Managed Keys -> SSE-S3
- AWS Key Management Service, Managed Keys -> SSE-KMS
- Server-side encryption with customer-provided keys -> SSE-C

Versioning

S3 stores all versions of an object (all writes), even if you delete a versioned object
Once enabled they can’t be disabled, only suspended
Versioning is integrated with lifecycle rules
Versioning has MFA Delete capability, which uses Multi-Factor Authentication (can be used to provide an additional layer of security)

Lifecycle management

Used to move objects between the different storage tiers (classes)
Can be used in conjunction with versioning
Can be applied to current versions and previous versions of objects

Pricing

You’ll be charged for:

Storage
Requests
Storage Management Pricing
Data Transfer Pricing
Transfer Acceleration
- long-distance file transfer, using Edge Locations (via Amazon Global Backbone network)
Cross-Region Replication
- data replication across AWS Regions
S3 tiers ordered by price (from the most expensive ones):
- S3 Standard
- S3 IA
- S3 Intelligent Tiering – the best choice for the common cases
- S3 One Zone – IA
- Glacier
- Glacier Deep Archive

Sharing buckets across accounts

With Bucket Policies and IAM
- programmatic access only (via API or AWS CLI)
- applies across the entire bucket
With Bucket ACLs & IAM
- programmatic access only
- applies to individual objects
With cross-account IAM Roles
- programmatic and console (AWS Management Console) access

S3 Cross-Region Replication

Versioning must be enabled on both source and destination buckets
Regions must be unique
Existing files in the bucket are not replicated automatically
Delete markers aren’t replicated
Deleting individual versions or delete markers will not be replicated

S3 Transfer Acceleration

S3 Transfer Acceleration utilizes CloudFront Edge Network to accelerate S3 uploads

CloudFront – CDN service

Edge Location
- the location where content will be cached (separate to AWS Region or AZ)
- aren’t read-only, we can put files there too
Origin
- Origin of all the files that CDN will distribute
  - e.g.: S3 bucket, EC2 instance, Elastic Load Balancer or Route 53
Distribution – CDN collection of edge locations
- Types:
  - Web Destination -> typically used for websites
  - RTMP -> typically for media streaming
Objects are cached for certain TTL
CloudFront invalidation
- lets you remove objects from CloudFront caches (you’ll be charged for it)

AWS Storage Gateway

On-premises access to virtually unlimited cloud storage
Hybrid cloud storage service
Types:
- File Gateway
  - for flat files
  - stored directly on S3
- Volume Gateway
  - stored volumes
    - the entire dataset is stored on-site and is backed up to the S3 asynchronously
  - cached volumes
    - the entire dataset is stored on S3 and the most frequently accessed data is cached on-site
- Gate Virtual Tape Library

Athena and Macie

Amazon Athena (SQL queries-based)

Interactive query service
Enables to analyze and query data located in S3 buckets
Using standard SQL
Serverless
Commonly used for log analyze

Amazon Macie (for security)

Security service
ML and NLP based
Analyzes S3 objects for sensitive data – PII (Personal Identification Information)
Can be used to analyze CloudTrail logs

Resources

Read it to gain more knowledge about Amazon S3: