Making RudderStack Ad-Blocker Proof in 66 Lines of Code
By Max Werner
•On Aug 26, 2021
•Updated Nov 13, 2021
What Does the RudderStack’s JavaScript SDK Need?
In order to function the RudderStack JavaScript SDK needs three things:
- Embed the JS SDK on the page
- Retrieve the source config from the RudderStack Data Plane
- Sending AJAX POST requests to your Data Plane URL for identify, track, page, group calls and so on
How to Prevent Ad-Blocking of Your Analytics and Data Collection
Well that’s quite simple, don’t get the SDK from cdn.rudderlabs.com
and don’t send requests to *.dataplane.rudderstack.com
. We’ll do this by simply proxying the requests through a CloudFlare worker. And if your site is hosted through CloudFlare you can have it run off a subdomain very easily too!
We’ll do this in the following easy steps:
- Setup a CloudFlare worker with the CloudFlare worker CLI
- Add the code required (all 66 lines of it)
- Publish the worker
- Attach it to a subdomain (optional)
- Configure CloudFlare’s SSL Settings (optional)
- Configure our website’s JavaScript implementation to use our URLs
Step 1: Setting up CloudFlare Workers
CloudFlare’s documentation is quite easy to follow and can be found here. The command line steps are:
npm install -g @cloudflare/wrangler
wrangler login
wrangler generate my-rudder-proxy
This will generate a my-rudder-proxy
directory containing all the code required for the cloudflare worker setup. Step 2: Add the code required (all 66 lines of it) Opening up the index.js
file of the worker we can replace it with the contents of this GitHub Gist. Simply replace the Data Plane URL in line 7 with yours and that’s it.
If you’re interested in why this is so easy, read on, otherwise skip down to the next step.
This works like a charm for two important reasons:
(1) RudderStack lets you configure where it will retrieve the source config from in it’s load method like so
rudderanalytics.load(
RUDDER_WRITE_KEY,
DATA_PLANE_URL,
{ configUrl: SOME_URL }
);
The RudderStack JS SDK automatically appends ‘/sourceConfig’ to this SOME_URL. As you can see in the code, calling our worker’s ‘/sourceConfig’ endpoint will do just that, and your write-key is base64 encoded in the request as an Authorization
header, that’s why we’re simply passing on the request headers! This means the browser calls “your” URL which is the worker, which gets the sourceConfig from the real RudderStack and then gives it back to you. But since the browser’s request goes against your own subdomain, it is not caught in the tracker list browsers and ad-blocker extensions maintain.
(2) RudderStack Requests can go anywhere, that’s what your DATA_PLANE_URL is for, meaning we’ll simply point the DataPlane URL to our worker which will forward the request to the actual RudderStack DataPlane URL.
Step 3: Publish the worker
Open your wrangler.toml
file and update it to be this:
name = "rsp"
type = "webpack"
account_id = "YOUR_ACCOUNT_ID"
workers_dev = true
[env.production]
route = "yoursub.domain.com/*"
zone_id = "YOUR_ZONE_ID"
You can retrieve the account_id and zone_id from your cloudflare dashboard. As for the route
part simply chose the subdomain you want to run this through. In my case I use rsp.obsessiveanalytics.com
(rsp standing for RudderStack Proxy ;)).
That’s it as far as the configuration goes. Simply run wrangler publish --env production
and it’ll handle the rest for you.
Step 4: Attach it to a subdomain (optional)
This is quite simple. All you need to do is create a DNS record for the subdomain that resolves to anything. In my case I use rsp
as my subdomain so I simply add an AAAA
record for it pointing to the IPV6 placeholder of 100::
and an A record to 192.0.2.1
. The details for this can be found here but suffice to say, this will work.
Step 5: CloudFlare’s SSL Settings (optional)
This is only required if you want to use your own subdomain. If you want to run the worker through a CloudFlare provided *.workers.dev
subdomain, you don’t need this step.
You will get Status Code 525 errors if your SSL settings are not set to Full or Full (strict). So simply set the SSL settings to either and we’re done here.
BE CAREFUL
This can break connections to your origin servers for the same domain if you have any. If you have it set to flexible (default) and go to full or full strict, your origin servers need either self-signed or CloudFlare provided SSL certs. If you don’t have this, your origin servers are seen as “down”. This has nothing to do with the workers setup but with other servers you might have serving things for the same domain. If you don’t know what this means, skip step 4 and 5 and just use the .workers.dev
domain. It’ll work although it’ll be less “clean” as ad-blockers might include such public domains at some point in the future!
BE CAREFUL
Step 6: Configure our website’s JavaScript implementation to use our URLs
Thankfully this part is quite easy too. For the purposes of this article we’ll assume that your worker’s subdomain is foo.bar.com
.
In your website’s code, instead of adding a script tag for https://cdn.rudderlabs.com/v1/rudder-analytics.min.js
you add one for https://foo.bar.com/dataPlane
In your RudderStack’s initialization code you change
rudderanalytics.load(
RUDDER_WRITE_KEY,
DATA_PLANE_URL,
{ configUrl: SOME_URL }
);
To
rudderanalytics.load(
RUDDER_WRITE_KEY,
'https://foo.bar.com',
{ configUrl: 'https://foo.bar.com' }
);
The script tag will ensure that you’re getting the JS SDK code but technically from your own domain, not RudderStack’s. This means that that request isn’t blocked as you’re simply including a script tag from your own domain. What could be more innocuous? ;)
The load()
changes will ensure that your data is sent to your own domain (foo.bar.com
instead of something.dataplane.rudderstack.com
AND the source config is retrieved from https://foo.bar.com/sourceConfig/
as opposed to https://something.dataplane.rudderstack.com/sourceConfig/
respectively.
That’s it!