Streaming Events to External Data Lakes

March 28, 2025

With the recent introduction of the Event Sink provider channel we have opened up the possibility of anyone building up their own dataset of events being collected through their EdgeTag tags. Today, we're going to have a look at how this channel can be used to stream events to any datalake through AWS Firehose.

Setting up a Cloudflare worker endpoint

In order to authenticate requests to AWS Firehose we will need to deploy an HTTP endpoint that will have access to the AWS keys required to relay event payloads.

We can set this up by creating a new Cloudflare worker and setting up the required secrets in the Settings page:

The Variables and Secrets section inside the Settings page for a Cloudflare worker in the CF Dashboard

Clicking Add will allow you to add a secret containing you AWS parameters:

The Add form to create a secret for your Worker

We recommend using type: Secret for these variables since their values will not be visible after being set, as opposed to other types of variables.

The values you will need to use AWS Firehose are:

AWS region - e.g. us-east-1
Firehose stream name - the name of the stream to send events to
AWS access ID - a value representing the AWS credential ID to use Firehose with
AWS access key - the corresponding key value for the given AWS access ID
Bearer token - any random value you generate that will be used to authenticate calls made by EdgeTag to your worker

After setting these secrets you should be able to see these secrets in your configured Cloudflare worker:

The Variables and Secrets section after deploying the configured secrets

Using Wrangler to Create and Deploy the Worker

Now that we have the environment variables set up, we can use Wrangler to create a project that will allow us to build and deploy the worker.

Let's start by creating a project by running the following command:

wrangler init aws-firehose-example

The CLI tool will guide you through the setup process by asking you questions about the project. For the purposes of this example, let's use the following choices:

Hello World Starter
Worker only
TypeScript
No Git version control - you can opt to initialize a Git repository if you wish
No deployment - for the purposes of this example, we'll just build the worker locally

This generates a Hello World starter project that contains the bare essentials of what we need to build our code.

A directory listing for the generated Worker project using Wrangler

In order to be able to connect to AWS Firehose, we'll need to install the aws4 and axios packages. In a terminal, navigate to the project directory and install the packages:

npm i aws4 axios

Add the following entry into wrangler.jsonc to enable NodeJS compatibility for the installed NPM packages:

"compatibility_flags": ["nodejs_compat"],

Next, let's update the worker-configuration.d.ts file to declare the secrets we set up in the worker. This will allow TypeScript to correctly identify the variables we use in code:

declare namespace Cloudflare {
  interface Env {
    AWS_ACCESS_ID: string
    AWS_ACCESS_KEY: string
    AWS_FIREHOSE_STREAM: string
    AWS_REGION: string
    BEARER_TOKEN: string
  }
}
interface Env extends Cloudflare.Env {}

Now we can write the implementation for the worker that will forward events to AWS Firehose with all of the correct credentials:

import aws4 from 'aws4'
import axios from 'axios'
 
const getEndpoint = (region: string) =>
  `https://firehose.${region}.amazonaws.com`
 
export default {
  async fetch(request, env, ctx): Promise<Response> {
    // we will receive a POST request from EdgeTag
    if (request.method == 'POST') {
      // we check if the request has the Authorization header and reject if is missing
      if (
        request.headers.get('Authorization') != `Bearer ${env.BEARER_TOKEN}`
      ) {
        // reject this request as it does not have authorization to post events
        return new Response('Not allowed', { status: 403 })
      }
 
      const endpoint = getEndpoint(env.AWS_REGION)
 
      // we'll grab the payload as a string so we don't have to reencode it as JSON
      const payload = await request.text()
 
      // we prepare the payload to be sent to AWS Firehose
      const firehoseRequest = {
        host: new URL(endpoint).host,
        method: 'POST',
        url: endpoint,
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({
          DeliveryStreamName: env.AWS_FIREHOSE_STREAM,
          Data: payload,
        }),
      }
 
      // sign the request
      aws4.sign(request, {
        accessKeyId: env.AWS_ACCESS_ID,
        secretAccessKey: env.AWS_ACCESS_KEY,
      })
 
      try {
        // send the request to Firehose
        const response = await axios({
          method: firehoseRequest.method,
          url: firehoseRequest.url,
          headers: firehoseRequest.headers,
          data: firehoseRequest.body,
        })
 
        // if successful, return a 200 OK response and log it to the console
        console.log('Event sent to Firehose', response.data)
        return new Response('OK')
      } catch (err) {
        // if any error happens, we log it to the console and return a 500 response
        console.log(`Error while sending data to Firehose`, err)
        return new Response(
          `Error while sending data to Firehose: ${(err as Error).message}`,
          { status: 500 }
        )
      }
    }
    return new Response('Hello World!')
  },
} satisfies ExportedHandler<Env>

You can now build the Worker using the build command:

wrangler build

The final code will be output in the dist directory:

The contents of the dist directory after building the Worker code

We can now copy the index.js content and paste it into the Cloudflare editor to deploy the updated worker code.

Setting up the Event Sink channel

We can use the worker's public URL and the bearer token you set in secrets to configure the Event Channel form appropriately:

Setting up the Event Sink provider channel with appropriate headers

After saving this provider channel, events will automatically start flowing through the Cloudflare endpoint you provided in the form.

You can open the realtime logs on your deployed worker to see events coming through and debug any errors by updating the worker.

That's all it takes to create a connection to forward events to your own Firehose-backed data lake!