13 mins
Cloud Architecture and AWS Best Practices: Building Scalable Infrastructure

Comprehensive guide to designing cloud-native applications and AWS infrastructure with real-world patterns and cost optimization

Cloud Architecture and AWS Best Practices: Building Scalable Infrastructureh1

Hello! I’m Ahmet Zeybek, a full stack developer with extensive experience in cloud architecture and AWS infrastructure. Moving to the cloud has transformed how we build and scale applications, offering unprecedented flexibility and power. In this comprehensive guide, I’ll share the patterns and practices that have helped me design cost-effective, scalable, and reliable cloud architectures.

Cloud Architecture Fundamentalsh2

1. Well-Architected Frameworkh3

AWS’s five pillars of well-architected design:

Operational Excellenceh4

  • Automate everything: Infrastructure as Code (IaC)
  • Monitor and log: Comprehensive observability
  • Incident response: Runbooks and automation

Securityh4

  • Defense in depth: Multiple security layers
  • Least privilege: Minimal required permissions
  • Encryption everywhere: Data at rest and in transit

Reliabilityh4

  • Fault tolerance: Design for failure
  • Auto scaling: Handle traffic spikes
  • Disaster recovery: Multi-region backup

Performance Efficiencyh4

  • Right-sized resources: Don’t over-provision
  • Caching strategies: Reduce database load
  • CDN usage: Global content delivery

Cost Optimizationh4

  • Demand-based scaling: Pay only for what you use
  • Resource optimization: Right-size instances
  • Storage tiering: Use appropriate storage classes

Infrastructure as Codeh2

2. AWS CDK for Infrastructureh3

Modern infrastructure provisioning:

import * as cdk from 'aws-cdk-lib'
import * as ec2 from 'aws-cdk-lib/aws-ec2'
import * as rds from 'aws-cdk-lib/aws-rds'
import * as lambda from 'aws-cdk-lib/aws-lambda'
import * as apigateway from 'aws-cdk-lib/aws-apigateway'
import * as cloudfront from 'aws-cdk-lib/aws-cloudfront'
import * as s3 from 'aws-cdk-lib/aws-s3'
export class MyAppStack extends cdk.Stack {
constructor(scope: Construct, id: string, props?: cdk.StackProps) {
super(scope, id, props)
// VPC with proper networking
const vpc = new ec2.Vpc(this, 'MyAppVPC', {
maxAzs: 3,
natGateways: 1,
subnetConfiguration: [
{
cidrMask: 24,
name: 'Public',
subnetType: ec2.SubnetType.PUBLIC,
},
{
cidrMask: 24,
name: 'Private',
subnetType: ec2.SubnetType.PRIVATE_WITH_NAT,
},
{
cidrMask: 24,
name: 'Database',
subnetType: ec2.SubnetType.PRIVATE_ISOLATED,
},
],
})
// RDS PostgreSQL database
const database = new rds.DatabaseInstance(this, 'MyAppDB', {
engine: rds.DatabaseInstanceEngine.postgres({ version: rds.PostgresEngineVersion.VER_15 }),
instanceType: ec2.InstanceType.of(ec2.InstanceClass.BURSTABLE3, ec2.InstanceSize.MICRO),
vpc,
vpcSubnets: { subnetType: ec2.SubnetType.PRIVATE_ISOLATED },
databaseName: 'myapp',
allocatedStorage: 20,
maxAllocatedStorage: 100,
storageEncrypted: true,
backupRetention: cdk.Duration.days(7),
deletionProtection: true,
monitoringInterval: cdk.Duration.seconds(60),
enablePerformanceInsights: true,
})
// Lambda functions
const apiHandler = new lambda.Function(this, 'ApiHandler', {
runtime: lambda.Runtime.NODEJS_20_X,
code: lambda.Code.fromAsset('../lambda/dist'),
handler: 'index.handler',
timeout: cdk.Duration.seconds(30),
memorySize: 512,
environment: {
DATABASE_URL: database.secret?.secretValueFromJson('connectionString').unsafeUnwrap()!,
NODE_ENV: 'production',
},
vpc,
vpcSubnets: { subnetType: ec2.SubnetType.PRIVATE_WITH_NAT },
securityGroups: [createLambdaSecurityGroup(this, vpc)],
})
// API Gateway
const api = new apigateway.RestApi(this, 'MyAppAPI', {
restApiName: 'MyApp API',
description: 'API for MyApp',
deployOptions: {
stageName: 'prod',
dataTraceEnabled: true,
loggingLevel: apigateway.MethodLoggingLevel.INFO,
metricsEnabled: true,
},
})
// CloudFront distribution
const distribution = new cloudfront.Distribution(this, 'MyAppCDN', {
defaultBehavior: {
origin: new origins.RestApiOrigin(api),
viewerProtocolPolicy: cloudfront.ViewerProtocolPolicy.REDIRECT_TO_HTTPS,
compress: true,
cachePolicy: cloudfront.CachePolicy.CACHING_OPTIMIZED,
},
comment: 'CDN for MyApp',
enabled: true,
httpVersion: cloudfront.HttpVersion.HTTP2_AND_3,
priceClass: cloudfront.PriceClass.PRICE_CLASS_ALL,
})
// S3 bucket for static assets
const assetsBucket = new s3.Bucket(this, 'AssetsBucket', {
bucketName: `myapp-assets-${this.account}-${this.region}`,
publicReadAccess: false,
blockPublicAccess: s3.BlockPublicAccess.BLOCK_ALL,
encryption: s3.BucketEncryption.S3_MANAGED,
versioned: true,
lifecycleRules: [
{
id: 'Transition to IA',
enabled: true,
transitions: [
{
storageClass: s3.StorageClass.INFREQUENTLY_ACCESSED,
transitionAfter: cdk.Duration.days(30),
},
{
storageClass: s3.StorageClass.GLACIER,
transitionAfter: cdk.Duration.days(90),
},
],
},
],
})
// Outputs
new cdk.CfnOutput(this, 'CDNURL', {
value: `https://${distribution.distributionDomainName}`,
description: 'CloudFront distribution URL',
})
new cdk.CfnOutput(this, 'DatabaseEndpoint', {
value: database.instanceEndpoint.hostname,
description: 'Database endpoint',
})
}
}
function createLambdaSecurityGroup(scope: Construct, vpc: ec2.IVpc): ec2.SecurityGroup {
const sg = new ec2.SecurityGroup(scope, 'LambdaSG', { vpc })
// Allow outbound to database
sg.addEgressRule(ec2.Peer.ipv4(vpc.vpcCidrBlock), ec2.Port.tcp(5432), 'Allow database connections')
// Allow outbound to internet (for external APIs)
sg.addEgressRule(ec2.Peer.anyIpv4(), ec2.Port.tcp(443), 'Allow HTTPS outbound')
return sg
}

Serverless Architectureh2

3. Event-Driven Serverless Designh3

Build applications that respond to events:

// AWS Lambda handlers
import { DynamoDBClient } from '@aws-sdk/client-dynamodb'
import { S3Client, GetObjectCommand } from '@aws-sdk/client-s3'
import { SNSClient, PublishCommand } from '@aws-sdk/client-sns'
const dynamoClient = new DynamoDBClient({})
const s3Client = new S3Client({})
const snsClient = new SNSClient({})
// Process file upload event
export const processFileUpload = async (event: S3Event) => {
for (const record of event.Records) {
const bucket = record.s3.bucket.name
const key = record.s3.object.key
try {
// Get file metadata
const fileData = await s3Client.send(new GetObjectCommand({ Bucket: bucket, Key: key }))
// Process file based on type
if (key.endsWith('.csv')) {
await processCSVFile(bucket, key)
} else if (key.endsWith('.json')) {
await processJSONFile(bucket, key)
}
// Update processing status
await updateProcessingStatus(key, 'completed')
// Send notification
await snsClient.send(
new PublishCommand({
TopicArn: process.env.NOTIFICATION_TOPIC_ARN,
Message: JSON.stringify({
type: 'FILE_PROCESSED',
fileKey: key,
status: 'success',
}),
})
)
} catch (error) {
console.error('File processing error:', error)
// Update status to failed
await updateProcessingStatus(key, 'failed')
// Send error notification
await snsClient.send(
new PublishCommand({
TopicArn: process.env.NOTIFICATION_TOPIC_ARN,
Message: JSON.stringify({
type: 'FILE_PROCESSING_ERROR',
fileKey: key,
error: error.message,
}),
})
)
}
}
}
// API Gateway handler
export const apiHandler = async (event: APIGatewayEvent) => {
const { httpMethod, path, body } = event
try {
switch (`${httpMethod} ${path}`) {
case 'GET /users':
return await getUsers()
case 'POST /users':
return await createUser(JSON.parse(body))
case 'GET /users/{id}':
return await getUser(event.pathParameters?.id)
case 'PUT /users/{id}':
return await updateUser(event.pathParameters?.id, JSON.parse(body))
case 'DELETE /users/{id}':
return await deleteUser(event.pathParameters?.id)
default:
return {
statusCode: 404,
body: JSON.stringify({ error: 'Not found' }),
}
}
} catch (error) {
console.error('API error:', error)
return {
statusCode: 500,
body: JSON.stringify({
error: 'Internal server error',
message: process.env.NODE_ENV === 'development' ? error.message : 'Something went wrong',
}),
}
}
}
// Step Functions for complex workflows
export const orderProcessingWorkflow = async (event: StepFunctionEvent) => {
const { orderId } = event
try {
// 1. Validate order
await validateOrder(orderId)
// 2. Check inventory
const inventoryAvailable = await checkInventory(orderId)
if (!inventoryAvailable) {
await updateOrderStatus(orderId, 'CANCELLED')
return { status: 'cancelled', reason: 'insufficient_inventory' }
}
// 3. Process payment
const paymentResult = await processPayment(orderId)
if (!paymentResult.success) {
await updateOrderStatus(orderId, 'PAYMENT_FAILED')
return { status: 'failed', reason: 'payment_failed' }
}
// 4. Reserve inventory
await reserveInventory(orderId)
// 5. Update order status
await updateOrderStatus(orderId, 'CONFIRMED')
// 6. Send confirmation email
await sendOrderConfirmation(orderId)
return { status: 'completed', orderId }
} catch (error) {
console.error('Order processing error:', error)
// Compensating actions
await updateOrderStatus(orderId, 'FAILED')
await releaseInventory(orderId)
throw error
}
}

Database Architectureh2

4. Multi-Region Database Designh3

Ensure high availability and disaster recovery:

// Aurora Global Database setup
const globalDatabase = new rds.DatabaseCluster(this, 'GlobalDB', {
engine: rds.DatabaseClusterEngine.auroraPostgres({
version: rds.AuroraPostgresEngineVersion.VER_15,
}),
instances: 2,
instanceProps: {
instanceType: ec2.InstanceType.of(ec2.InstanceClass.BURSTABLE3, ec2.InstanceSize.LARGE),
vpc,
vpcSubnets: { subnetType: ec2.SubnetType.PRIVATE_ISOLATED },
},
storageEncrypted: true,
backup: {
retention: cdk.Duration.days(7),
preferredWindow: '03:00-04:00',
},
monitoringInterval: cdk.Duration.seconds(60),
enablePerformanceInsights: true,
})
// Read replicas in different regions
const readReplicaUSWest2 = new rds.ClusterInstance(this, 'ReadReplicaUSW2', {
instanceType: ec2.InstanceType.of(ec2.InstanceClass.BURSTABLE3, ec2.InstanceSize.LARGE),
cluster: globalDatabase,
promotionTier: 2,
})
// ElastiCache for Redis
const redisCluster = new elasticache.CfnServerlessCache(this, 'RedisCluster', {
engine: 'redis',
serverlessCacheName: 'myapp-redis',
description: 'Redis cluster for session storage and caching',
securityGroupIds: [redisSecurityGroup.securityGroupId],
subnetIds: vpc.privateSubnets.map((subnet) => subnet.subnetId),
cacheUsageLimits: {
dataStorage: {
maximum: 10,
unit: 'GB',
},
ecpuPerSecond: {
maximum: 10000,
},
},
dailySnapshotTime: '05:00',
majorEngineVersion: '7',
})

Security Architectureh2

5. Zero-Trust Security Modelh3

Implement comprehensive security:

// IAM policies with least privilege
const lambdaExecutionRole = new iam.Role(this, 'LambdaExecutionRole', {
assumedBy: new iam.ServicePrincipal('lambda.amazonaws.com'),
managedPolicies: [iam.ManagedPolicy.fromAwsManagedPolicyName('service-role/AWSLambdaBasicExecutionRole')],
inlinePolicies: {
DatabaseAccess: new iam.PolicyDocument({
statements: [
new iam.PolicyStatement({
effect: iam.Effect.ALLOW,
actions: ['rds-db:connect'],
resources: [database.secret?.secretArn!],
}),
],
}),
S3Access: new iam.PolicyStatement({
effect: iam.Effect.ALLOW,
actions: ['s3:GetObject', 's3:PutObject'],
resources: [`${assetsBucket.bucketArn}/*`],
}),
},
})
// VPC endpoints for secure access
const dynamodbEndpoint = new ec2.GatewayVpcEndpoint(this, 'DynamoDBEndpoint', {
service: ec2.GatewayVpcEndpointAwsService.DYNAMODB,
vpc,
subnets: [{ subnetType: ec2.SubnetType.PRIVATE_WITH_NAT }],
})
const s3Endpoint = new ec2.GatewayVpcEndpoint(this, 'S3Endpoint', {
service: ec2.GatewayVpcEndpointAwsService.S3,
vpc,
subnets: [{ subnetType: ec2.SubnetType.PRIVATE_WITH_NAT }],
})
// Security groups with specific rules
const databaseSecurityGroup = new ec2.SecurityGroup(this, 'DatabaseSG', {
vpc,
description: 'Security group for database',
allowAllOutbound: false,
})
// Only allow connections from application security group
databaseSecurityGroup.addIngressRule(applicationSecurityGroup, ec2.Port.tcp(5432), 'Allow PostgreSQL connections from application')
// WAF for API Gateway
const webACL = new wafv2.CfnWebACL(this, 'MyAppWebACL', {
name: 'MyAppWebACL',
scope: 'REGIONAL',
defaultAction: { block: {} },
rules: [
{
name: 'RateLimit',
priority: 1,
action: { block: {} },
statement: {
rateBasedStatement: {
limit: 1000,
aggregateKeyType: 'IP',
},
},
visibilityConfig: {
sampledRequestsEnabled: true,
cloudWatchMetricsEnabled: true,
metricName: 'RateLimitRule',
},
},
{
name: 'SQLInjection',
priority: 2,
action: { block: {} },
statement: {
sqliMatchStatement: {
fieldToMatch: { body: {} },
textTransformations: [
{ priority: 0, type: 'LOWERCASE' },
{ priority: 1, type: 'URL_DECODE' },
],
},
},
visibilityConfig: {
sampledRequestsEnabled: true,
cloudWatchMetricsEnabled: true,
metricName: 'SQLInjectionRule',
},
},
],
visibilityConfig: {
sampledRequestsEnabled: true,
cloudWatchMetricsEnabled: true,
metricName: 'MyAppWebACL',
},
})

Monitoring and Observabilityh2

6. Comprehensive Monitoring Setuph3

Monitor your entire infrastructure:

// CloudWatch dashboards
const dashboard = new cloudwatch.Dashboard(this, 'MyAppDashboard', {
dashboardName: 'MyApp-Monitoring-Dashboard',
defaultInterval: cdk.Duration.hours(24),
})
// Add widgets to dashboard
dashboard.addWidgets(
new cloudwatch.GraphWidget({
title: 'API Gateway Latency',
left: [
new cloudwatch.Metric({
namespace: 'AWS/ApiGateway',
metricName: 'Latency',
dimensionsMap: { ApiName: api.restApiName },
}),
],
}),
new cloudwatch.GraphWidget({
title: 'Lambda Errors',
left: [
new cloudwatch.Metric({
namespace: 'AWS/Lambda',
metricName: 'Errors',
dimensionsMap: { FunctionName: apiHandler.functionName },
}),
],
}),
new cloudwatch.GraphWidget({
title: 'Database Connections',
left: [
new cloudwatch.Metric({
namespace: 'AWS/RDS',
metricName: 'DatabaseConnections',
dimensionsMap: { DBInstanceIdentifier: database.instanceIdentifier },
}),
],
})
)
// CloudWatch alarms
const highLatencyAlarm = new cloudwatch.Alarm(this, 'HighLatencyAlarm', {
alarmName: 'MyApp-HighLatency',
alarmDescription: 'API Gateway latency is too high',
metric: new cloudwatch.Metric({
namespace: 'AWS/ApiGateway',
metricName: 'Latency',
dimensionsMap: { ApiName: api.restApiName },
}),
threshold: 1000, // 1 second
evaluationPeriods: 2,
comparisonOperator: cloudwatch.ComparisonOperator.GREATER_THAN_THRESHOLD,
treatMissingData: cloudwatch.TreatMissingData.NOT_BREACHING,
})
// SNS topic for notifications
const alarmTopic = new sns.Topic(this, 'AlarmTopic', {
topicName: 'MyApp-Alarms',
displayName: 'MyApp Alarm Notifications',
})
// Subscribe email to alarms
alarmTopic.addSubscription(new subscriptions.EmailSubscription('alerts@myapp.com'))
// Connect alarm to topic
highLatencyAlarm.addAlarmAction(new actions.SnsAction(alarmTopic))

Cost Optimizationh2

7. Cost Optimization Strategiesh3

Reduce cloud costs while maintaining performance:

// Auto scaling configuration
const autoScalingGroup = new autoscaling.AutoScalingGroup(this, 'WebServerASG', {
vpc,
instanceType: ec2.InstanceType.of(ec2.InstanceClass.BURSTABLE3, ec2.InstanceSize.MICRO),
machineImage: ec2.MachineImage.latestAmazonLinux2(),
minCapacity: 1,
maxCapacity: 10,
desiredCapacity: 2,
cooldown: cdk.Duration.minutes(5),
// Scale based on CPU utilization
scalingPolicies: [
{
scalingPolicyName: 'ScaleOut',
scalingPolicyType: autoscaling.ScalingPolicyType.TARGET_TRACKING_SCALING,
targetTrackingConfiguration: {
predefinedMetricSpecification: {
predefinedMetricType: autoscaling.PredefinedMetricType.ASGAverageCPUUtilization,
},
targetValue: 70,
},
},
],
// Scheduled scaling for predictable traffic
scheduledActions: [
{
scheduledActionName: 'ScaleUpForBusinessHours',
minSize: 3,
maxSize: 8,
desiredCapacity: 5,
timeZone: 'America/New_York',
schedule: autoscaling.Schedule.cron({ hour: '9', minute: '0' }),
},
{
scheduledActionName: 'ScaleDownAfterBusinessHours',
minSize: 1,
maxSize: 3,
desiredCapacity: 2,
timeZone: 'America/New_York',
schedule: autoscaling.Schedule.cron({ hour: '18', minute: '0' }),
},
],
})
// Spot instances for cost savings
const spotFleet = new ec2.CfnSpotFleet(this, 'SpotFleet', {
spotFleetRequestConfigData: {
iamFleetRole: fleetRole.roleArn,
allocationStrategy: 'diversified',
targetCapacity: 10,
spotPrice: '0.10', // Maximum spot price
launchSpecifications: [
{
instanceType: 'm5.large',
ami: 'ami-12345678',
keyName: 'my-key-pair',
securityGroups: [webSecurityGroup.securityGroupId],
subnetId: vpc.publicSubnets[0].subnetId,
weightedCapacity: '1',
},
],
},
})
// S3 intelligent tiering
const intelligentTieringBucket = new s3.Bucket(this, 'IntelligentTieringBucket', {
bucketName: `myapp-intelligent-${this.account}`,
intelligentTieringConfigurations: [
{
id: 'EntireBucket',
prefix: '',
tierings: [
{
accessTier: s3.AccessTier.FREQUENT_ACCESS,
days: 30,
},
{
accessTier: s3.AccessTier.INFREQUENT_ACCESS,
days: 90,
},
{
accessTier: s3.AccessTier.ARCHIVE_ACCESS,
days: 365,
},
],
},
],
})
// Cost and usage report
const costReport = new s3.Bucket(this, 'CostReportBucket', {
bucketName: `myapp-cost-reports-${this.account}`,
lifecycleRules: [
{
id: 'DeleteOldReports',
enabled: true,
expiration: cdk.Duration.days(2555), // 7 years
},
],
})
// Enable cost and usage report
new cur.CfnReportDefinition(this, 'CostAndUsageReport', {
reportName: 'MyAppCostAndUsageReport',
timeUnit: 'DAILY',
format: 'Parquet',
compression: 'Parquet',
additionalSchemaElements: ['RESOURCES'],
s3Bucket: costReport.bucketName,
s3Prefix: 'cost-reports',
s3Region: this.region,
refreshClosedReports: true,
})

Multi-Region Architectureh2

8. Global Infrastructure Designh3

Build for global scale:

// Multi-region setup
export class GlobalStack extends cdk.Stack {
constructor(scope: Construct, id: string, props: cdk.StackProps) {
super(scope, id, props)
// Primary region (us-east-1)
const primaryRegion = new cdk.Stack(scope, 'PrimaryRegion', {
env: { region: 'us-east-1' },
})
// Secondary region (eu-west-1)
const secondaryRegion = new cdk.Stack(scope, 'SecondaryRegion', {
env: { region: 'eu-west-1' },
})
// Global resources
const globalTable = new dynamodb.Table(this, 'GlobalTable', {
tableName: 'MyApp-GlobalTable',
partitionKey: { name: 'pk', type: dynamodb.AttributeType.STRING },
sortKey: { name: 'sk', type: dynamodb.AttributeType.STRING },
billingMode: dynamodb.BillingMode.PAY_PER_REQUEST,
replicationRegions: ['us-east-1', 'eu-west-1', 'ap-southeast-1'],
pointInTimeRecovery: true,
})
// Route 53 for global routing
const hostedZone = new route53.HostedZone(this, 'MyAppZone', {
zoneName: 'myapp.com',
})
// CloudFront with Lambda@Edge
const distribution = new cloudfront.Distribution(this, 'GlobalCDN', {
defaultBehavior: {
origin: new origins.S3Origin(assetsBucket),
edgeLambdas: [
{
functionVersion: edgeFunction.currentVersion,
eventType: cloudfront.LambdaEdgeEventType.ORIGIN_REQUEST,
},
],
},
})
// Global health check
const healthCheck = new route53.HealthCheck(this, 'GlobalHealthCheck', {
fqdn: 'api.myapp.com',
port: 443,
type: route53.HealthCheckType.HTTPS,
resourcePath: '/health',
failureThreshold: 3,
requestInterval: cdk.Duration.seconds(30),
})
}
}

DevOps and Automationh2

9. CI/CD Pipeline with AWSh3

Automated deployment pipeline:

.github/workflows/deploy.yml
name: Deploy to AWS
on:
push:
branches: [main]
workflow_dispatch:
env:
AWS_REGION: us-east-1
NODE_ENV: production
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
- name: Install dependencies
run: npm ci
- name: Run tests
run: npm run test
- name: Build application
run: npm run build
- name: Upload build artifacts
uses: actions/upload-artifact@v4
with:
name: build-artifacts
path: dist/
deploy-infrastructure:
needs: test
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main'
steps:
- uses: actions/checkout@v4
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ secrets.AWS_ROLE_ARN }}
aws-region: ${{ env.AWS_REGION }}
- name: Setup CDK
uses: aws-actions/setup-aws-cdk@v1
- name: Install CDK dependencies
run: npm ci
- name: Deploy to AWS
run: |
cdk bootstrap
cdk deploy --require-approval never
deploy-application:
needs: deploy-infrastructure
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Download build artifacts
uses: actions/download-artifact@v4
with:
name: build-artifacts
path: dist/
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ secrets.AWS_ROLE_ARN }}
aws-region: ${{ env.AWS_REGION }}
- name: Deploy Lambda functions
run: |
# Update Lambda function code
aws lambda update-function-code \
--function-name MyAppApiHandler \
--s3-bucket myapp-deployment-bucket \
--s3-key lambda-functions/api-handler.zip
# Update API Gateway
aws apigateway update-stage \
--rest-api-id ${{ secrets.API_GATEWAY_ID }} \
--stage-name prod \
--patch-op replace \
--patch-path deploymentId \
--patch-value ${{ secrets.DEPLOYMENT_ID }}
smoke-tests:
needs: deploy-application
runs-on: ubuntu-latest
steps:
- name: Run smoke tests
run: |
# Health check
curl -f https://api.myapp.com/health
# Basic API tests
curl -f -X POST https://api.myapp.com/test \
-H "Content-Type: application/json" \
-d '{"test": "data"}'

Disaster Recoveryh2

10. Backup and Recovery Strategyh3

Ensure business continuity:

// Automated backup strategy
const backupPlan = new backup.BackupPlan(this, 'MyAppBackupPlan', {
backupPlan: {
backupPlanName: 'MyApp-BackupPlan',
backupPlanRules: [
{
ruleName: 'DailyBackups',
targetBackupVault: backupVault,
scheduleExpression: events.Schedule.cron({ hour: '2', minute: '0' }),
lifecycle: {
deleteAfter: cdk.Duration.days(30),
},
},
{
ruleName: 'WeeklyBackups',
targetBackupVault: backupVault,
scheduleExpression: events.Schedule.cron({ weekDay: 'SUN', hour: '3', minute: '0' }),
lifecycle: {
deleteAfter: cdk.Duration.days(90),
},
},
],
},
})
// Backup vault with encryption
const backupVault = new backup.BackupVault(this, 'MyAppBackupVault', {
backupVaultName: 'MyApp-BackupVault',
encryptionKey: kmsKey,
accessPolicy: new iam.PolicyDocument({
statements: [
new iam.PolicyStatement({
effect: iam.Effect.ALLOW,
principals: [new iam.AccountPrincipal(this.account)],
actions: ['backup:*'],
resources: ['*'],
}),
],
}),
})
// Cross-region replication for S3
const replicatedBucket = new s3.Bucket(this, 'ReplicatedBucket', {
bucketName: `myapp-replicated-${this.account}`,
replicationRules: [
{
id: 'CrossRegionReplication',
status: s3.ReplicationStatus.ENABLED,
destination: {
bucket: `arn:aws:s3:::myapp-backup-${secondaryRegion}`,
storageClass: s3.StorageClass.STANDARD_IA,
},
filter: {
prefix: '',
},
},
],
})
// Disaster recovery Lambda function
const disasterRecoveryFunction = new lambda.Function(this, 'DisasterRecovery', {
runtime: lambda.Runtime.NODEJS_20_X,
code: lambda.Code.fromAsset('../lambda/disaster-recovery'),
handler: 'index.handler',
timeout: cdk.Duration.minutes(15),
environment: {
PRIMARY_REGION: this.region,
SECONDARY_REGION: secondaryRegion,
BACKUP_BUCKET: replicatedBucket.bucketName,
},
})

Performance Optimizationh2

11. Performance Monitoring and Optimizationh3

Monitor and optimize performance:

// Lambda performance optimization
const optimizedFunction = new lambda.Function(this, 'OptimizedFunction', {
runtime: lambda.Runtime.NODEJS_20_X,
code: lambda.Code.fromAsset('../lambda/optimized'),
handler: 'index.handler',
memorySize: 1024, // Increased memory for better CPU allocation
timeout: cdk.Duration.seconds(30),
reservedConcurrentExecutions: 50,
environment: {
NODE_OPTIONS: '--enable-source-maps --stack-trace-limit=1000',
},
// Provisioned concurrency for predictable performance
provisionedConcurrentExecutions: 10,
// Dead letter queue for failed invocations
deadLetterQueue: new sqs.Queue(this, 'FailedInvocationsDLQ', {
queueName: 'MyApp-FailedInvocations',
retentionPeriod: cdk.Duration.days(14),
}),
})
// ElastiCache for Redis with read replicas
const redisCluster = new elasticache.CfnReplicationGroup(this, 'RedisCluster', {
replicationGroupId: 'myapp-redis',
replicationGroupDescription: 'Redis cluster for caching',
engine: 'redis',
engineVersion: '7.0',
cacheNodeType: 'cache.t3.micro',
numCacheClusters: 2,
automaticFailoverEnabled: true,
multiAzEnabled: true,
cacheSubnetGroupName: cacheSubnetGroup.ref,
securityGroupIds: [redisSecurityGroup.securityGroupId],
// Read replicas for read-heavy workloads
numNodeGroups: 1,
replicasPerNodeGroup: 2,
})

Conclusionh2

Cloud architecture with AWS offers incredible power and flexibility, but success depends on proper design and implementation. The patterns and practices I’ve shared here provide a solid foundation for building scalable, secure, and cost-effective cloud applications.

Key takeaways:

  • Infrastructure as Code for consistency and automation
  • Serverless architecture for cost efficiency
  • Multi-region design for high availability
  • Security-first approach with defense in depth
  • Comprehensive monitoring for observability
  • Cost optimization through right-sizing and automation

Remember, cloud architecture is an iterative process. Start simple, measure everything, and continuously optimize based on real-world usage patterns.

What cloud architecture challenges are you facing? Which AWS services have worked best for your use cases? Share your experiences!

Further Readingh2


This post reflects my experience as of October 2025. AWS services and best practices evolve rapidly, so always verify the latest documentation and regional availability.