Data testing: Leveraging the network layer to support advanced testing scenarios

Effective testing is absolutely crucial for any data pipeline. In this article we consider how to leverage advanced networking infrastructure to enable data testers to perform validation on client devices that are traditionally more restricted, rendering standard test methods impossible.

The challenge

From a high-level engineering perspective ‘testing’ reduces to the problem of comparing a specification against some real-world behaviour of an app or device, at a defined point in a development pipeline (perhaps we have a release candidate build the business is looking to ship). In practice – in the context of data collection – this means inspecting the contents of ‘network calls’ (‘behaviour’) and ensuring these match an expected set of values. ‘Network calls’ can be thought of as packets of information a client app generates and sends over the public internet to some service for processing and presentation (consider, for instance, how Adobe Analytics or Google Analytics get the data observable within reports).

For certain types of devices – particularly mobile phones and PCs/Macs – inspecting network calls for validation is relatively easy. There exist tools that can be configured to effectively sit ‘between’ a device and a particular service backend upstream on the public internet (such as Adobe’s data collection edge). These tools are sometimes referred to as ‘packet analysers’ or ‘proxies’. The ‘Charles’ and ‘MITMProxy’ products are such examples. By leveraging these tools, all network calls (in particular, those requiring inspection for testing purposes) are sent first through a proxy which records and visualises all the data a given client is sending over the internet to a service. We’re therefore able to confirm correctness (either by automated or manual methods) and can further save records of behaviour for future reference and analysis.

Preparing a client for data testing as described above depends on specific network setting support on the host device-under-test, however. Certain devices support the configuration of upstream proxies out the box, in which case test preparation is as simple as providing the client device 2 pieces of specific information (an IP address and ‘port’) within a device’s networking settings. Other types of device are traditionally more locked down however, in that they support less advanced user-facing networking configurations. This is typical of ‘connected devices’, such as set top boxes (or some games consoles) and in these cases data testing via proxy is simply not possible – at least not in a direct sense – because we can’t define the specific networking settings required.

The solution

We can solve the problem above by recognising the fact that ‘proxy’ functionality can be considered independently from any given client device, from a networking perspective. In other words, if we were able to ‘extract’ proxy functionality further upstream in a way that’s invisible to the client device, our problems are solved.

Indeed, there exists advanced networking gear on the market that offers the exact functionality we require. These devices serve as standard wired or wireless internet access points but offer the ability to set more feature-rich configuration options at a higher layer in the networking ‘stack’. Crucially, such devices include features that allow users to set a proxy at the level of the access point, i.e., one layer ‘above’ the client. As far as the client is concerned, it’s just connected to the internet, unaware of any upstream manipulation. The process is akin to setting a proxy on your wireless router at home, instead of on the device-under-test as would traditionally be the case.

Once the required settings are defined, we can continue to test using tools like Charles as normal. What we’ve done is tell the device-under-test to talk to our network access point intermediary. Because the intermediary has a proxy configured, it forwards all network calls to e.g. Charles where we can confirm data correctness. Charles finally forwards all hits to their original internet destinations. We typically must leverage this more involved data testing practice for devices like set top boxes. This contrasts with cases in which we may proxy directly (e.g. most mobile phones). In these scenarios, the device itself sends network calls to e.g. Charles directly – with no requirement for an intermediary as we described.

Caveats

Data testing via network intermediaries is inherently more complex from a set up and maintenance point of view. In our experience this caveat is largely manageable. Lynchpin have found cases in which client network traffic is encrypted (a common and best practice in modern software engineering) to be the more challenging caveat. Indeed, it does not currently seem possible or feasible to configure network intermediaries to support the upstream proxying of encrypted traffic. Such network calls are not blocked (i.e. these still reach their original destinations), but are essentially not visible at the higher levels of the network stack in a such a way that they pass through tools like Charles.

Conclusion

The method of data testing described here (i.e. ‘testing by proxy’) is essentially a ‘man-in-the-middle’ (or ‘MITM’) attack (which incidentally for the interested reader, is a type of network attack computer security specialists study and seek to prevent). Some devices support proxying directly, and others don’t – in which case we need to call upon more creative solutions to enable validation. Whether we can proxy directly or whether we need to leverage upstream network gear as discussed, the basic concept remains the same: we place a 3rd party (under our control) between the client-under-test and some backend service that can inspect and record behaviour. Data testing as described in this article is a harmless use of the MITM attack, but this nonetheless explains why QA can sometimes be challenging when dealing with certain client apps; testers are essentially trying to circumvent the internet’s network safety model to perform QA. In other words, we’re trying to do something modern internet security infrastructure is specifically trying to prevent, albeit in a completely controlled manner.

About the author

Thomas Cumming

Thomas holds a Masters of Informatics qualification from Edinburgh University.

Thomas has worked on a broad range of data projects throughout his time at Lynchpin – from building automated testing and validation systems for marketing technologies, to leading a technical transformation of marketing technology across multiple devices and systems for a leading broadcaster.

Thomas has significant experience in Adobe Analytics and related technologies. Combined with his experience in Microsoft Azure and experience of developing data warehousing architectures, he is able to advise clients on transformation projects to bring clarity to complex customer behaviour or sales and marketing challenges.

All author posts