Friday, May 4, 2018

Choose Privacy Week: Your Library Organization Is Watching You

Choose Privacy Week
T. J. Lamana and I have written a post for Choose Privacy Week. It's mirrored here, but be sure to check out all the great posts there.

Your Library Organization Is Watching You

We commonly hear that ‘Big Brother’ is watching you, in the context of digital and analog surveillance such as Facebook advertising, street cameras, E-Zpass highway tracking or content sniffing by internet service providers. But it’s not only Big Brother, there are a lot of “Little Brothers” as well, smaller less obvious that wittingly or unwittingly funnel data, including personal identifiable information (PII) to massive databases. Unfortunately libraries (and related organizations) are a part of this surveillance environment. In the following we’ll break down two example library organization websites. We’ll be focusing on two American Library Association (ALA) websites: ALA’s Office of Intellectual Freedom’s Choose Privacy Week website (ChoosePrivacyWeek.org) and ALA’s umbrella site (ala.org).

Before we dive too deeply, let’s review some basics about the data streams generated by a visit to a website. When you visit a website, your browser software - Chrome, Firefox, Safari, etc. - sends a request containing your IP address, the address of the webpage you want, and a whole bunch of other information. If the website supports “SSL”, most of that information is encrypted. If not, network providers are free to see everything sent or received. Without SSL, bad actors who share the networks can insert code or other content into the webpage you receive. The easiest way to see if a site has a valid SSL certificate is to look at the protocol identifier of a url. If it’s ‘HTTPS’, that traffic is encrypted, if it’s ‘HTTP’ DO NOT SEND any personally identifiable information (PII), as there is no guarantee that traffic is being protected. If you’re curious about the quality of a sites encryption, you can check its “Qualys report”, offered by SSL Labs., which checks the website’s configuration, and assigns a letter grade. ALA.org gets a B; ChoosePrivacyWeek gets a A. The good news is that even ALA.org’s B is an acceptable grade. The bad news is that the B grade is for “https://www.ala.org/”, whose response is reproduced here in its entirety:

Unfortunately the ALA website is mostly available only without SSL encryption.

You don’t have to check the SSL Labs to see the difference. You can recognize ChoosePrivacyWeek.org as a “secure” connection by looking for the lock badge in your browser; click on that badge for more info. Here’s what the sites look like in Chrome:








Don’t assume that your privacy is protected just because a site has a lock badge, because the was is designed to spew data about you in many ways. Remember that “whole bunch of other information” we glossed over above? Included in that “other information” are “cookies” which allow web servers to keep track of your browsing session. It’s almost impossible to use the web these days without sending these cookies. But many websites include third party services that track your session as well. These are more insidious, because they give you an identifier that joins your activity across multiple websites. The combination of data from thousands of websites often gives away your identity, which then can be used in ways you have no control over.

Privacy Badger is a browser extension created by the Electronic Frontier Foundation (EFF) which monitors the embedded code in websites that may be tracking your web traffic. You can see a side-by-side comparison of ALA.org on the left and ChoosePrivacyWeek on the right:
ALA.org
ChoosePrivacyWeek.org

The 2 potential trackers identified by Privacy Badger on ChoosePrivacyWeek are third party services: fonts from Google and an embedded video player from Vimeo. These are possibly tracking users, but are not optimized to do so. The 4 trackers on ALA.org merit a closer look. They’re all from Google; the ones of concern are placed by Google Analytics. One of us has written about how Google analytics can be configured to respect user privacy, if you trust Google’s assurances. To its credit, ALA.org has turned on the "anonymizeIP" setting, which in theory obscures user’s identity. But it also has “demographics” turned on, which causes an advertising (cross-domain) cookie to be set for users of ALA.org, and Google’s advertising arm is free to use ALA.org user data to target advertising (which is how Google makes money). PrivacyBadger allows you to disable any or all of these trackers and potential trackers (though doing so can break some websites).

Apart from giving data to third parties, any organization has to have internal policies and protocols for handling the reams of data generated by website users. It’s easy to forget that server logs may be grow to contain hundreds of gigabytes or more of data that can be traced back to individual users. We asked ALA about their log retention policies with privacy in mind. ALA was kind enough to respond:
“We always support privacy, so internal meetings are occurring to determine how to make sure that we comply with all applicable laws while always protecting member/customer data from exposure. Currently, ALA is taking a serious look at collection and retention in light of the General Data Protection Regulation (GDPR) EU 2016/679, a European Union law on data protection and privacy for all individuals within the EU. It applies to all sites/businesses that collect personal data regardless of location.”
Reading in between the lines, it sounds like ALA does not yet have log retention policies or protocols. It’s encouraging that these items are on the agenda, but disappointing that it’s 2018 and these items are on the agenda. ALA.org has a 4 year old privacy policy on its website that talks about the data it collects, but has no mention of a retention policy, or of third party service use.

The ChoosePrivacyWeek website has a privacy statement that’s more emphatic:
We will collect no personal information about you when you visit our website unless you choose to provide that information to us.
The lack of tracking on the site is aligned with this statement, but we’d still like to see a statement about log retention. ChoosePrivacyWeek is hosted on a DreamHost WordPress server, and usage log files at Dreamhost were recently sought by the Department of Justice in the Disruptj20.org case.

Organizations express their priorities and values in their actions. ALA’s stance toward implementing HTTPS will be familiar to many librarians; limited IT resources get deployed according competing priorities. In the case of ALA, a sorely needed website redesign was deemed more important to the organization than providing incremental security and privacy to website users by implementing HTTPS. Similarly, the demographic information provided by Google’s advertising tracker was valued more than member privacy (assuming ALA is aware of the trade-off). The ChoosePrivacyWeek.org website has a different set of values and objectives, and thus has made some different choices.

In implementing their websites and services, libraries make many choices that impact on user privacy. We want librarians, library administrators, library technology staff and library vendors to be aware of the choices they are making, and aware of the values they are expressing on behalf of an organization or of a library. We hope that they will CHOOSE PRIVACY.

2 comments:

  1. At first, I thought your post was about libraries using loan history etc. to track clients.

    When suggesting books to my library, they actually say they are tracking me:
    "Your email is subject to disclosure under … Public Records Law and is not privileged or confidential except to the extent it contains customer account information. It will be retained for 30 days."

    ReplyDelete
    Replies
    1. That's why libraries subject to public records laws delete their records regularly. Privacy concerns are why most libraries that offer to remember the loan history only do so on an opt-in basis. Unfortunately, some of those records leak when technical aspects of a website are poorly understood by library staff.

      Delete

Note: Only a member of this blog may post a comment.