Using "list quality" to improve the accuracy of your A/B testing
An experiment from GetUp.
Another in the GetUp experiment series, this article looks at approaches to email analytics.
Background
As discussed in previous experiments, no two GetUp emails are sent to the same list. Because of the way that we cut our lists up, we’ve had trouble comparing different emails sent because they are usually sent to different groups of members with different activity levels.
To help us compare different emails we created a score called ‘list quality’ which gives a score to each member according to that member’s actions taken in the previous year prior to the point at which the quality is being measured – in the case of emails, this point is immediately before the email is sent. This score can then be averaged out across an entire email list to assess what kinds of results would be expected for an email sent to that list.
In this document I’ll run through how we’ve used list quality scores to set benchmarks for various email metrics, in order to set goals for campaign emails.
Data collection
I collected statistics on all emails sent since January 2014. For these emails, I aggregated their statistics according to ‘blast’ which is part of the GetUp website structure. A blast can contain multiple emails with different content, but all of the emails in the blast are sent to the same email list, with nobody receiving more than one email in the blast. It is common that our campaigners use multiple emails within a blast to test subject lines or other variations in email content, and I didn’t want to factor those into this dataset.
Data processing
This left me with a dataset of almost 2000 blasts. This data includes the number of sends, opens and clicks for each blast, the number of actions broken down by action type, and the amount of money raised from the email.
I then narrowed the scope by only including blasts sent to at least 10,000 people. We had a bunch of emails sent to very small numbers of members, and they weren’t useful in assessing broader trends.
All blasts were classified according to the main action type which the email generated. If a blast didn’t produce 10 or more actions of a particular type it was classified as ‘no action’ – the types of actions were donations, petitions, calls and emails (the latter two referring to calls and emails to campaign targets such as Members of Parliament).
Dividing blasts into ‘buckets’ based on list quality
I divided email blasts into buckets based on their list quality. Because there are less donation emails, I grouped donations into a smaller number of buckets compared to the other analysis.
Each bucket represents all of the emails which had an average member score per capita within that range – for example bucket ‘2 to 4’ covers all emails with scores higher than 2 and lower than 4 for the list receiving the email.
Calculating benchmarks
The following metrics were calculated:
- Open rate for emails (number of members viewing the email over the number of members receiving the email)
- Action rate for emails with a petition or email ask (number of members taking action over the number of members receiving the email)
- Fundraising efficiency - Dollars raised per email sent for emails with at least 10 one-off donations and at least $500 donated in one-off donations.
For each metric, I then calculated the first quartile, median and third quartile for each bucket of emails. This provides a sense of the normal range of results for an email list of a particular list quality.
Note on confidentiality
Publishing these benchmarks would reveal a great deal of private information about GetUp action rates and open rates, so instead I’ve included charts showing how they change relative to list quality, with the Y axis hidden for privacy reasons.
There are currently no comments.