Amazon S3 in a nutshell [AWS Solutions Architect Associate Exam]

Amazon S3 service is probably the most commonly used service in the AWS cloud. Today I’m going to go through the basic information you’ll need to know about Amazon S3.

The format of the AWS Solutions Architect Associate Exam posts series is based on the notes I made during my preparation for the Solutions Associate exam.

The goal is to keep it simple in the form of bullet-points, easy for the quick review.

I hope you’ll find it valuable.

Basics

Amazon S3 stands for Amazon Simple Storage Service.

  • It’s a safe pace to save your storage
  • Object-based storage
  • Data is spread across multiple devices and facilities
  • Good for hosting files/static websites but not suitable for installing OS or DB on

Files / Objects

  • S3 allows storing files from 0 bytes to 5 terabytes
  • Unlimited storage
  • Files are stored in resources called buckets
  • S3 has a universal namespace -> names must be unique globally
  • The response code for the successful upload to S3 should be 200 OK
  • S3 is key-value object-based storage:
    • The key -> the name of the object.
    • The value -> is the actual data.
  • Allows versioning for files and providing additional data about the object – metadata
  • Object subresources:
    • Access Control Lists (list with permissions)
    • Torrent (BitTorrent protocol support)

Data consistency model in S3

  • Read after Write consistency -> for PUT requests with new objects
  • Eventual consistency -> for overwrite PUT and DELETE requests
    • It can take some time to propagate

Guarantees

  • Durability: 99,999999999% (eleven nines)
  • Availability: 99,99% (One Zone IA has 99,95%)

Features

  • Tiered storage
    • Amazon S3 Storage Classes
  • Lifecycle management
    • enables moving object between tiers
  • Versioning
    • all versions of the object are being stored when enabled
  • Encryption
    • Encryption in Transit
    • Encryption at Rest
  • Objects protection with MFA (Multi-Factor Authentication) delete
  • Secure data using Access Control List and Bucket Policies

Storage Classes (tiers)

  • S3 Standard
    • 99,99% availability
    • 99,999999999% durability
    • stored redundantly across multiple devices
    • designed to sustain the loss of 2 facilities concurrently
  • S3 – IA (Infrequent Access)
    • infrequently accessed
    • good choice for data that are accessed less frequently, but requires rapid access when you need it
    • lower fee than S3 Standard but you are charged a retrieval fee
  • S3 One Zone – IA
    • 99,5% availability
    • a lower-cost option for IA data
    • don’t require the multiple Availability Zone data resilience model
  • S3 Intelligent Tiering
    • designed for cost optimization by moving data to the most cost-effective access tier automatically (without performance nor the operational impact)
  • S3 Glacier
    • for data archiving
    • low-cost
    • unlimited data storage
    • configurable retrieval time (from mins to hours)
  • S3 Glacier Deep Archive
    • lowest cost storage
    • retrieval time of 12h

Security

  • Encryption on Transit:
    • SSL / TLS
  • Encryption at Rest:
    • S3-Managed Keys -> SSE-S3
    • AWS Key Management Service, Managed Keys -> SSE-KMS
    • Server-side encryption with customer-provided keys -> SSE-C

Versioning

  • S3 stores all versions of an object (all writes), even if you delete a versioned object
  • Once enabled they can’t be disabled, only suspended
  • Versioning is integrated with lifecycle rules
  • Versioning has MFA Delete capability, which uses Multi-Factor Authentication (can be used to provide an additional layer of security)

Lifecycle management

  • Used to move objects between the different storage tiers (classes)
  • Can be used in conjunction with versioning
  • Can be applied to current versions and previous versions of objects

Pricing

You’ll be charged for:

  • Storage
  • Requests
  • Storage Management Pricing
  • Data Transfer Pricing
  • Transfer Acceleration
    • long-distance file transfer, using Edge Locations (via Amazon Global Backbone network)
  • Cross-Region Replication
    • data replication across AWS Regions
  • S3 tiers ordered by price (from the most expensive ones):
    • S3 Standard
    • S3 IA
    • S3 Intelligent Tiering – the best choice for the common cases
    • S3 One Zone – IA
    • Glacier
    • Glacier Deep Archive

Sharing buckets across accounts

  • With Bucket Policies and IAM
    • programmatic access only (via API or AWS CLI)
    • applies across the entire bucket
  • With Bucket ACLs & IAM
    • programmatic access only
    • applies to individual objects
  • With cross-account IAM Roles
    • programmatic and console (AWS Management Console) access

S3 Cross-Region Replication

  • Versioning must be enabled on both source and destination buckets
  • Regions must be unique
  • Existing files in the bucket are not replicated automatically
  • Delete markers aren’t replicated
  • Deleting individual versions or delete markers will not be replicated

S3 Transfer Acceleration

  • S3 Transfer Acceleration utilizes CloudFront Edge Network to accelerate S3 uploads

CloudFront – CDN service

  • Edge Location
    • the location where content will be cached (separate to AWS Region or AZ)
    • aren’t read-only, we can put files there too
  • Origin
    • Origin of all the files that CDN will distribute
      • e.g.: S3 bucket, EC2 instance, Elastic Load Balancer or Route 53
  • Distribution – CDN collection of edge locations
    • Types:
      • Web Destination -> typically used for websites
      • RTMP -> typically for media streaming
  • Objects are cached for certain TTL
  • CloudFront invalidation
    • lets you remove objects from CloudFront caches (you’ll be charged for it)

AWS Storage Gateway

  • On-premises access to virtually unlimited cloud storage
  • Hybrid cloud storage service
  • Types:
    • File Gateway
      • for flat files
      • stored directly on S3
    • Volume Gateway
      • stored volumes
        • the entire dataset is stored on-site and is backed up to the S3 asynchronously
      • cached volumes
        • the entire dataset is stored on S3 and the most frequently accessed data is cached on-site
    • Gate Virtual Tape Library

Athena and Macie

Amazon Athena (SQL queries-based)

  • Interactive query service
  • Enables to analyze and query data located in S3 buckets
  • Using standard SQL
  • Serverless
  • Commonly used for log analyze

Amazon Macie (for security)

  • Security service
  • ML and NLP based
  • Analyzes S3 objects for sensitive data – PII (Personal Identification Information)
  • Can be used to analyze CloudTrail logs

Resources

Read it to gain more knowledge about Amazon S3:

Leave a Reply

Your email address will not be published.