Blocklists

Privacy Browser uses the EasyList blocklists. EasyList is formatted using the Adblock syntax.

Privacy Browser features only a partial implementation of the Adblock feature set. Some of these limitations are required due to a lack of functionality exposed by WebView. In the future, with the release of Privacy WebView, some of these controls may be implemented. Other features are not implemented in the interest of performance on handheld devices.

Lines that begin with [ are headers and are ignored by Privacy Browser.

Lines that begin with ! are comments and are ignored except to extract the version number of the file.

Lines that begin with @@ are whitelists and are processed according to the rules listed below.

Lines that begin with | match against the beginning of the URL.

Lines that end with | (or have entries that end with |) match against the end of the URL.

Lines that contain \ are regular expressions. Checking a regular expression against a URL is relatively expensive in terms of CPU consumption. Luckily, EasyList only contains a small number of regular expressions.

Lines that contain * could all be processed as regular expressions, but that would be a significant performance issue. Instead, they are processed in the following way:

  1. If the entry also contains \ it is processed as a regular expression.
  2. If the entry begins or ends with * the wildcard character is stripped out. These are redundant as the Adblock syntax defines example.com, *example.com, example.com*, and *example.com* as being the same. In all cases, any URL that contains example.com will be blocked.
  3. If the entry contains text separated by * it is broken into segments and each one is checked against the URL. For example, if the entry is adserver.com/*.jpg the URL will be checked to see if it contains both adserver.com/ and .jpg. This could match against a few URLs not intended by the original block entry. For example, it would match againsthttp://adserver.com/123.jpg but also other URLs where the segments come in a different order like http://otherserver.com/123.jpg?somevaluethatincludesadserver.com/. But the number of false positives should be small and the benefit in processing speed is significant.

Lines that contain $ include filter options. Differentiating many of these would require access to more information than Android’s WebView exposes, and as such are ignored. The only two filter options that are processed are domain and third-party.

Lines that contain filter options preceded by ~ are inverse filter options. All of these are ignored except ~domain, which white lists the domain but applies the filter to other domains. Lines that contain ~third-party are ignored, because they typically are used to block the loading of some type of filter resource (script, xmlhttprequest, etc) from the main domain. Because Privacy Browser is not able to differentiate between these resources, if the block is applied it will block all resources from the domain, rendering it completely unusable. This is something that might be addressed in the future.

Lines that begin with @@ and contain ~domain filters are ignored because it is uncertain exactly what the two mean when combined.

Lines that begin with || begin checking against the root of the domain.  So ||badurl.com/ad.jpg blocks badurl.com/ad.jpg and www.badurl.com/ad.jpg but not notbadurl.com/ad.jpg. Implementing this logic correctly would be costly in terms of CPU processing, so Privacy Browser ignores || and processes the rest of the entry like any other.  This could lead to a small number of false positives.

Some lines contain ^, which is a separator wildcard that matches against :, /, ?, =, and &. These wildcards are ignored, which could lead to a small number of false positives as example.com^ will match against intended URLs like http://example.com/ad.jpg and http://example.com:8443/ad.jpg, but also unintended URLs like http://example.computer.com/coolresource. Once again, the performance gain is worth the very small number of false positives.

Lines that contain ##, ###, ##., #?#, and #@# hide class, id, and name HTML and CSS elements in the code. Android’s WebView does not expose these types of controls and the lines are ignored by Privacy Browser.

The raw entries from the blocklists are processed into the following 22 ArrayLists used by Privacy Browser.  Each blocklist has its own set of ArrayLists, which are checked in the following order.

  1. Main Whitelist
  2. Final Whitelist
  3. Domain Whitelist
  4. Domain Initial Whitelist
  5. Domain Final Whitelist
  6. Third-Party Whitelist
  7. Third-Party Domain Whitelist
  8. Third-Party Domain Initial Whitelist
  9. Main Blacklist
  10. Initial Blacklist
  11. Final Blacklist
  12. Domain Blacklist
  13. Domain Initial Blacklist
  14. Domain Final Blacklist
  15. Domain Regular Expression Blacklist
  16. Third-Party Blacklist
  17. Third-Party Initial Blacklist
  18. Third-Party Domain Blacklist
  19. Third-Party Domain Initial Blacklist
  20. Third-Party Regular Expression Blacklist
  21. Third-Party Domain Regular Expression Blacklist
  22. Regular Expression Blacklist

Initial lists check against the beginning of the URL. Final lists check against the end of the URL. Domain lists only check against certain domains. Third-party lists only apply if the root domain of the request is different than the root domain of the main URL. Regular expression lists follow the regular expression syntax. Each ArrayList item has one or more entry that derives from the original Adblock entry. In the case of domain ArrayLists, the resource request is only checked against the item if the first entry matches the domain of the main URL.

Before a web page loads a resource, it is checked against the blocklists that are enabled in the following order:

  1. EasyList
  2. EasyPrivacy
  3. Fanboy’s Annoyance List
  4. Fanboy’s Social Blocking List

If a resource matches against a whitelist entry, the resource is allowed by that blocklist and checking moves on to the following blocklist.  If a resource matches against a blacklist entry, the loading of the resource is blocked by Privacy Browser and no more checking is performed.

Whitelist entries on one list override blacklist entries on the same list but not on subsequent lists. For example, if a resource is allowed by a whitelist entry on EasyList, checking will move on to EasyPrivacy. If the same resource is blocked by a blacklist entry on EasyPrivacy, the loading of the resource will be blocked by Privacy Browser.

Android stores all assets (like the block list data) in compressed files in the APK.  On a Nexus 6P, decompressing and parsing the block lists when Privacy Browser starts takes about 3 seconds.  This is longer than I would like, but I am not sure at the moment how to shorten it down.

On a Nexus 6P, checking a resource URL against the lists takes about 20-30 milliseconds, which is fast enough to be unnoticeable.

Privacy Browser has two additional blocklists, one called UltraPrivacy that blocks trackers that EasyPrivacy allows, and the other that blocks all third-party requests. A request is only considered third-party if the base domain of the request is different than the base domain of the URL. For example, if www.website.com loads a picture from images.website.com, this is not blocked as a third-party request because they both share the same base domain of website.com. Blocking all third-party requests increases privacy, but this blocklist is disabled by default because it breaks a large number of websites. If enabled, blocking all third-party requests is processed before the other blocklists.

Further information about how the blocklists are parsed and applied can be found in the comments of the source code.