Using AWS Athena & Glue to query Route53 logs

I recently helped a customer identify which of their domains received the most domain name lookups. This came about due a 10x increase in Route53 costs on the bill.

By default, Route53 Hosted Zones are not configured for Query Logging, and must be enabled on a per Hosted Zone basis.

This guide will let you enable Route53 Query Logging, writing it to CloudWatch Logs and then export those to S3.
Then, using AWS Glue and Athena, we can create a serverless database which we can query.

ProTip: For Route53 logging, S3 bucket and CloudWatch log-group must be in US-EAST-1 (N.Virginia) region.  This is because Route53 is a ‘global’ service, not a region based service.

This guide uses a combination of AWS CLI and AWS Console (GUI) to complete the exercise.

Create a new bucket for the logs

aws s3api create-bucket --bucket r53-cwlogs-exported-logs --create-bucket-configuration LocationConstraint=us-east-1

Update the bucket policy to allow Cloudwatch logs to write to new bucket.

aws s3api put-bucket-policy --bucket r53-cwlogs-exported-logs --policy file://examplePolicy.json --region us-east-1

This is an examplePolicy.json file to be applied to the newly created bucket above. Make sure to update the Resource (bucket name) and the Principal (make sure the region is correct).

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": "s3:GetBucketAcl",
            "Effect": "Allow",
            "Resource": "arn:aws:s3:::r53-cwlogs-exported-logs",
            "Principal": { "Service": "logs.us-east-1.amazonaws.com" }
        },
        {
            "Action": "s3:PutObject" ,
            "Effect": "Allow",
            "Resource": "arn:aws:s3:::r53-cwlogs-exported-logs/*",
            "Condition": { "StringEquals": { "s3:x-amz-acl": "bucket-owner-full-control" } },
            "Principal": { "Service": "logs.us-east-1.amazonaws.com" }
        }
    ]
}

Create a log-group

aws logs create-log-group --log-group-name "/aws/route53/allR53queries" --region us-east-1

ProTip: Don’t use “.” in the log-group-name as this cause the export to s3 to fail via console and CLI (need to parse escape chars). A good name for the log-group is /aws/route53/allR53queries

Enable R53 Query Logging

Query Logging is enabled on a PER DOMAIN basis.  Repeat this step for each Hosted Zone you want to enable logging for.  Use the same log-group for all zones.

aws route53 create-query-logging-config --hosted-zone-id <value> --cloud-watch-logs-log-group-arn <value> --region us-east-1

Export CWLOGS to S3

If done manually (as per this guide), this is a one-off copy. You will need to setup regular Export jobs to keep it up to date

Set some BASH variables to make life easier.
epoch date ‘now’ and “-24h”

THEN=$(date -d "yesterday" "+%s")
NOW=$(date "+%s")
echo $NOW $THEN

Create the export to S3 task. Time format is unix epoch time.

aws logs create-export-task --task-name "export-cwlogs-route53" \
--log-group-name "/aws/route53/allR53queries" \
--from $THEN --to $NOW \
--destination "r53-cwlogs-exported-logs" \ 
--region us-east-1

Check s3 path for outputs.

aws s3 ls r53-cwlogs-exported-logs/exportedlogs/ --region us-east-1

AWS Glue – Create Database, Tables and Schema

This section should be completed in the AWS GUI console. So, go ahead and login to the console.

  •  Go to AWS Glue console
  •  Create Database (R53QueryLoggingDB)
  •  Select Database , Add Tables Manually
  •  Table name (R53Queries), choose database
  •  Select r53-cwlogs-exported-logs/exportlogs folder
  •  Choose Data Format=CSV and Delimiter=Tab <- this will need updating (no space option in the drop down). See AWS Athena steps below.
  •  Add Columns manually. see:R53 Query Log formats
  •  click Finish

Route53 Query Log format – after S3 export

https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/query-logs.html#query-logs-format  .  Once the logs have been export from CloudWatch to S3, an additional timestamp field is injected. This is the Cloudwatch Logs event timestamp.

The table below is the correct format for R53 Query Logs once in S3. This log file is delimited by a SPACE.

ColNbr Column name     Data type
1 query timestamp      string
2 log format version   string
3 timestamp            string
4 hosted zone id       string
5 query name           string
6 query type           string
7 response code        string
8 layer 4 protocol     string
9 route53 edge location string
10 resolver ip address string
11 edns client subnet  string

AWS Athena – Initial Query & correct the delimiter

  • Select Table (tick)
  • Action, View Data (which opens Athena in a new window)
  • On the Athena console, the SQL query will be pre-populated.
  • Press Run Query
  • Validate the output (columns are right, data looks right). It is likely not right and needs the Delimiter corrected. Change delimiter from TAB to SPACE.
  • Go back to Glue
  • Select table, Edit Table, find Serde Params and change SeparatorChar=<space <- press space

AWS Athena – Query

  • Go back to Athena,
  • Press Run Query again
  • Validate the output (columns are right, data looks right). Now the data should be formatted.

SQL commands

To query which FQDNs are most popular, use this

SELECT "query name", COUNT(*) as queries
FROM "R53QueryLoggingDB"."R53Queries"
GROUP BY "query name" ORDER BY queries DESC limit 100;

Athena-results

To query which R53 edge locations are most queried, use this

SELECT "route53 edge location", COUNT(*) as hits
FROM "R53QueryLoggingDB"."R53Queries"
GROUP BY "route 53 edge location" ORDER BY hits DESC limit 100;

The 3 letter acronym for the R53 edge locations is the same as airport codes.  E.g; MEL=Melbourne, LAX=Los Angeles, SFO=San Francisco

Update (refresh) cwlog data to S3 bucket

  • Go to CloudWatch Console and re-export the CW-log-group to S3.  OR create an export-task as per the steps above.
  • Go to Athena and re-run the queries, and the data should be changed/updated.

Some things to consider to improve this solution further.

  • automate the streaming of r53 logs to S3 bucket to keep data up to date
  • set a retention policy (maybe 14 days) for the CloudWatch Log-group
  • set a life-cycle policy on the S3 bucket to either delete or archive (to Glacier) the R53 log data
  • disable query logging if no longer needed

Some links

https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/S3ExportTasks.html

https://ig.nore.me/ <- Check out and subscribe to this excellent blog for more great AWS content.  THANKS to @ArjenSchwarz for his help with this.

HTH