AWS VPC Flow Logs with EKS: Enabling Network Visibility Using Terraform
January 10, 2026
I was troubleshooting a connectivity issue between pods in an EKS cluster and an external API last week. The pods were timing out, but I had no visibility into whether the traffic was even leaving the VPC. Without flow logs, I was essentially flying blind - making guesses about security group rules and NACLs without any data to back them up. That experience reminded me why VPC Flow Logs should be enabled from day one on any production infrastructure.
The Problem
Network troubleshooting in AWS without flow logs is painful. You’re left asking questions like:
- Is traffic actually reaching my NAT Gateway?
- Are my security groups blocking something I don’t expect?
- Why are connections to this external service timing out?
- Is there unexpected traffic hitting my VPC endpoints?
Without VPC Flow Logs, you’re stuck making educated guesses and iterating on security group rules hoping something sticks. Even worse, when security asks for an audit of network traffic patterns, you have nothing to show them.
What Are VPC Flow Logs?
VPC Flow Logs capture information about IP traffic going to and from network interfaces in your VPC. Think of them as a network tap that records metadata about every connection attempt - successful or not.
10.0.1.15"] Pod2["Pod 2
10.0.1.16"] ENI1["ENI"] ENI2["ENI"] end end VPCEndpoint["VPC Endpoint
(S3, ECR, etc.)"] end subgraph FlowLogs["VPC Flow Logs"] Capture["Capture Layer
(All ENI Traffic)"] CW["CloudWatch Logs"] S3["S3 Bucket"] end Internet["Internet"] Pod1 --- ENI1 Pod2 --- ENI2 ENI1 -.->|"Flow Record"| Capture ENI2 -.->|"Flow Record"| Capture NAT -.->|"Flow Record"| Capture ALB -.->|"Flow Record"| Capture VPCEndpoint -.->|"Flow Record"| Capture Capture --> CW Capture --> S3 NAT --> Internet Internet --> ALB
What Gets Captured
Each flow log record contains:
| Field | Description |
|---|---|
srcaddr |
Source IP address |
dstaddr |
Destination IP address |
srcport |
Source port |
dstport |
Destination port |
protocol |
IANA protocol number (6 = TCP, 17 = UDP) |
packets |
Number of packets transferred |
bytes |
Number of bytes transferred |
action |
ACCEPT or REJECT |
log-status |
Logging status (OK, NODATA, SKIPDATA) |
Here’s what a typical flow log record looks like:
2 123456789012 eni-abc123 10.0.1.15 52.94.76.5 443 49321 6 25 5000 1609459200 1609459260 ACCEPT OK
This tells me: A pod at 10.0.1.15 successfully connected to 52.94.76.5:443 over TCP, sending 25 packets (5000 bytes) over 60 seconds.
The Three Capture Levels
VPC Flow Logs can be attached at three different levels:
(Captures ALL traffic)"] SubnetLevel["Subnet Level
(Captures subnet traffic)"] ENILevel["ENI Level
(Captures specific interface traffic)"] end subgraph VPC["VPC: 10.0.0.0/16"] subgraph Subnet1["Subnet A: 10.0.1.0/24"] ENI1["ENI-1"] ENI2["ENI-2"] end subgraph Subnet2["Subnet B: 10.0.2.0/24"] ENI3["ENI-3"] ENI4["ENI-4"] end end VPCLevel -->|"Monitors"| VPC SubnetLevel -->|"Monitors"| Subnet1 ENILevel -->|"Monitors"| ENI3 style VPCLevel fill:#e1f5fe style SubnetLevel fill:#fff3e0 style ENILevel fill:#f3e5f5
For EKS clusters, I recommend VPC-level flow logs to capture all traffic, including pod-to-pod communication, egress through NAT Gateways, and VPC endpoint traffic.
Benefits of VPC Flow Logs
1. Network Troubleshooting
When pods can’t connect to external services, flow logs immediately show whether traffic is being rejected:
# Find rejected traffic from a specific pod
aws logs filter-log-events \
--log-group-name /aws/vpc/flow-logs \
--filter-pattern "10.0.1.15 REJECT"
If you see REJECT entries, you know exactly which security group or NACL rule to investigate.
2. Security Monitoring
Flow logs reveal unusual traffic patterns that might indicate compromise:
- Unexpected outbound connections to unknown IPs
- Port scanning activity (many connections to different ports)
- Data exfiltration (large outbound transfers to unusual destinations)
- Lateral movement attempts between subnets
# Find unusual outbound traffic (not to known AWS services)
aws logs filter-log-events \
--log-group-name /aws/vpc/flow-logs \
--filter-pattern "ACCEPT" \
| grep -v "amazonaws.com"
3. Compliance and Auditing
Many compliance frameworks (SOC 2, PCI-DSS, HIPAA) require network traffic logging. Flow logs provide:
- Complete record of all network connections
- Evidence of security group effectiveness
- Audit trail for forensic investigation
- Data retention in S3 for long-term storage
4. Cost Optimization
Flow logs help identify wasted network resources:
- NAT Gateway traffic that could use VPC endpoints
- Cross-AZ traffic that could be optimized
- Unused or underutilized network paths
Enabling VPC Flow Logs with the EKS Terraform Module
Now for the practical part. The terraform-aws-modules/eks/aws module doesn’t directly create VPC flow logs, but the companion terraform-aws-modules/vpc/aws module does. Here’s how to configure them together.
Complete Terraform Configuration
# VPC with Flow Logs enabled
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "~> 5.0"
name = "eks-vpc"
cidr = "10.0.0.0/16"
azs = ["us-west-2a", "us-west-2b", "us-west-2c"]
private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
public_subnets = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]
enable_nat_gateway = true
single_nat_gateway = false
one_nat_gateway_per_az = true
# Enable DNS support for VPC endpoints
enable_dns_hostnames = true
enable_dns_support = true
# Subnet tags required for EKS
public_subnet_tags = {
"kubernetes.io/role/elb" = 1
}
private_subnet_tags = {
"kubernetes.io/role/internal-elb" = 1
}
# VPC Flow Logs Configuration
enable_flow_log = true
create_flow_log_cloudwatch_log_group = true
create_flow_log_cloudwatch_iam_role = true
flow_log_max_aggregation_interval = 60
flow_log_cloudwatch_log_group_name_prefix = "/aws/vpc-flow-log/"
flow_log_cloudwatch_log_group_name_suffix = "eks-cluster"
# Retain logs for 30 days (adjust based on compliance requirements)
flow_log_cloudwatch_log_group_retention_in_days = 30
# Optional: Enable KMS encryption for logs
flow_log_cloudwatch_log_group_kms_key_id = aws_kms_key.flow_logs.arn
tags = {
Environment = "production"
Terraform = "true"
}
}
# KMS key for encrypting flow logs (optional but recommended)
resource "aws_kms_key" "flow_logs" {
description = "KMS key for VPC Flow Logs encryption"
deletion_window_in_days = 7
enable_key_rotation = true
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Sid = "Enable IAM User Permissions"
Effect = "Allow"
Principal = {
AWS = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:root"
}
Action = "kms:*"
Resource = "*"
},
{
Sid = "Allow CloudWatch Logs"
Effect = "Allow"
Principal = {
Service = "logs.${data.aws_region.current.name}.amazonaws.com"
}
Action = [
"kms:Encrypt*",
"kms:Decrypt*",
"kms:ReEncrypt*",
"kms:GenerateDataKey*",
"kms:Describe*"
]
Resource = "*"
Condition = {
ArnLike = {
"kms:EncryptionContext:aws:logs:arn" = "arn:aws:logs:${data.aws_region.current.name}:${data.aws_caller_identity.current.account_id}:*"
}
}
}
]
})
}
resource "aws_kms_alias" "flow_logs" {
name = "alias/vpc-flow-logs"
target_key_id = aws_kms_key.flow_logs.key_id
}
data "aws_caller_identity" "current" {}
data "aws_region" "current" {}
# EKS Cluster using the VPC
module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "~> 20.0"
cluster_name = "my-eks-cluster"
cluster_version = "1.29"
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnets
# Enable cluster endpoint for private access
cluster_endpoint_public_access = true
cluster_endpoint_private_access = true
eks_managed_node_groups = {
default = {
min_size = 2
max_size = 10
desired_size = 3
instance_types = ["m5.large"]
capacity_type = "ON_DEMAND"
}
}
tags = {
Environment = "production"
Terraform = "true"
}
}
Key Configuration Flags Explained
| Flag | Purpose | Recommended Value |
|---|---|---|
enable_flow_log |
Master switch for flow logs | true |
create_flow_log_cloudwatch_log_group |
Auto-create CloudWatch log group | true |
create_flow_log_cloudwatch_iam_role |
Auto-create IAM role for publishing | true |
flow_log_max_aggregation_interval |
Seconds to aggregate before publishing | 60 (1 minute) |
flow_log_cloudwatch_log_group_retention_in_days |
Log retention period | 30 to 365 based on compliance |
Alternative: Publishing to S3
For long-term storage or cost optimization, you can publish to S3 instead of CloudWatch:
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "~> 5.0"
# ... other configuration ...
enable_flow_log = true
flow_log_destination_type = "s3"
flow_log_destination_arn = aws_s3_bucket.flow_logs.arn
flow_log_file_format = "parquet" # Better for Athena queries
flow_log_max_aggregation_interval = 600 # 10 minutes for cost savings
# Partition logs by hour for efficient querying
flow_log_per_hour_partition = true
}
resource "aws_s3_bucket" "flow_logs" {
bucket = "my-vpc-flow-logs-${data.aws_caller_identity.current.account_id}"
}
resource "aws_s3_bucket_lifecycle_configuration" "flow_logs" {
bucket = aws_s3_bucket.flow_logs.id
rule {
id = "transition-to-glacier"
status = "Enabled"
transition {
days = 30
storage_class = "GLACIER"
}
expiration {
days = 365
}
}
}
Querying Flow Logs
Once enabled, you can query flow logs to troubleshoot issues.
CloudWatch Logs Insights
-- Find all rejected traffic in the last hour
fields @timestamp, srcAddr, dstAddr, srcPort, dstPort, action
| filter action = "REJECT"
| sort @timestamp desc
| limit 100
-- Find traffic from specific pod CIDR
fields @timestamp, srcAddr, dstAddr, dstPort, action, bytes
| filter srcAddr like /10\.0\.1\./
| stats sum(bytes) as totalBytes by dstAddr, dstPort
| sort totalBytes desc
-- Identify top talkers (most traffic)
fields srcAddr, dstAddr, bytes
| filter action = "ACCEPT"
| stats sum(bytes) as totalBytes by srcAddr
| sort totalBytes desc
| limit 20
Using Athena for S3 Logs
If you’re publishing to S3, create an Athena table for querying:
CREATE EXTERNAL TABLE vpc_flow_logs (
version int,
account_id string,
interface_id string,
srcaddr string,
dstaddr string,
srcport int,
dstport int,
protocol bigint,
packets bigint,
bytes bigint,
start bigint,
`end` bigint,
action string,
log_status string
)
PARTITIONED BY (date string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ' '
LOCATION 's3://my-vpc-flow-logs-123456789012/AWSLogs/123456789012/vpcflowlogs/us-west-2/'
TBLPROPERTIES ("skip.header.line.count"="1");
Architecture Overview
Here’s how everything fits together with EKS:
(ECR, S3, STS)"] end subgraph Observability["Observability"] CW["CloudWatch Logs
/aws/vpc-flow-log/eks-cluster"] S3["S3 Bucket
(Long-term storage)"] Athena["Athena
(Ad-hoc queries)"] end end Internet["Internet"] Pod1 & Pod2 & Pod3 --> Node1 & Node2 Node1 & Node2 -->|"Egress"| NAT1 & NAT2 NAT1 & NAT2 --> Internet Internet --> ALB ALB --> Pod1 & Pod2 & Pod3 Node1 & Node2 -->|"AWS API Calls"| VPCEndpoints VPC -.->|"All Traffic
Captured"| CW CW -->|"Export"| S3 S3 --> Athena style CW fill:#ff9800 style S3 fill:#4caf50 style Athena fill:#2196f3
Cost Considerations
VPC Flow Logs aren’t free, so here are strategies to manage costs:
- Use S3 instead of CloudWatch for long-term storage - significantly cheaper
- Increase aggregation interval to 600 seconds (10 minutes) if you don’t need real-time data
- Use Parquet format with S3 - reduces storage by 50-75%
- Enable per-hour partitioning for efficient Athena queries
- Set appropriate retention - don’t keep logs longer than compliance requires
- Consider sampling in high-traffic environments (though this reduces visibility)
Approximate costs (us-west-2):
- CloudWatch Logs: $0.50 per GB ingested + $0.03 per GB stored
- S3 Standard: $0.023 per GB stored
- S3 Glacier: $0.004 per GB stored
Key Learnings
- VPC Flow Logs capture network metadata at the ENI level - They record source/destination IPs, ports, protocols, and whether traffic was accepted or rejected by security groups and NACLs
- Enable flow logs at the VPC level for EKS - This captures all traffic including pod-to-pod communication, NAT Gateway egress, and VPC endpoint usage
- The terraform-aws-modules/vpc/aws module handles everything - Set
enable_flow_log = trueand it creates the log group, IAM role, and flow log resource automatically - Use CloudWatch for real-time troubleshooting, S3 for long-term storage - CloudWatch Logs Insights gives you quick queries, while S3 with Athena is better for historical analysis and compliance
- Always encrypt flow logs - Use KMS encryption for CloudWatch log groups, especially in regulated environments
- Parquet format with hourly partitions optimizes S3 costs - The Parquet format reduces storage significantly and partitions make Athena queries faster and cheaper
- Flow logs are essential for security incident response - When something goes wrong, having a complete record of network traffic is invaluable for forensic investigation
- Consider cost from day one - A busy VPC can generate gigabytes of logs daily; plan your retention and storage tier strategy upfront
The biggest lesson from my troubleshooting experience: the time to enable flow logs is before you need them. Trying to debug network issues without flow logs is like debugging code without logs - possible, but unnecessarily painful.