Australian Metadata (1): What are Metadata?



The Australian Prime Minister and his security advisors want to put in place legislation that allows the collection and retention of so-called ‘metadata’ in relation to Australians’ communications and electronic intercourse – telephone and email records, and internet browsing history may therefore all fall into the net. Confusion reigns at the moment in regard to this proposal, not least because it appears that the Prime Minister and certain of his colleagues do not seem to know what metadata are. The Prime Minister attempted to explain this by saying that “it is like the address on an envelope of a letter, the data being the actual contents of the letter, and this is a metaphor I think Australians can understand”. I take the suggestion here to be that Aussies are thick and uneducated but at least they know what a letter is. But unlike the Prime Minister, they probably also realise that an individual letter would be a datum, not data, and that what he is using is an analogy, not a metaphor (we have a leader who uses language in the same way he walks). So, to assess the proposal to retain metadata and to have access to it, we need to know what metadata are. The purpose of this post is provide what I think is a helpful explanation. The next one will appraise the proposal itself.


Philosophers are familiar with terms which have the prefix “meta”,  there is metaphysics, metalanguage, even metaphilosophy. In each such case, the prefix signifies what the discipline is about, metaphysics is about physics, etc. However, as we shall see, there are different senses in which something can be ‘about’ something else. Turning to data, data are familiar from a number of fields. A datum is a record of some specific and particular fact of interest, and a set of data is a collection of records of such facts. For instance, the collection and systematisation of data is the substance of much routine scientific research. This may amount to the measurement of the properties of physical or biological systems. So the measurement of the pressure, volume and temperature of a gas in a cylinder fitted with a piston are three data, and a series of such measurements is a set of data. Suppose in addition there is a record of who took the measurements, where they were taken, etc., then these records are metadata with respect to the originals, they are facts about the facts, data about the data.


There can be different sorts of data about data, depending on what is of interest. Suppose each set of readings in the example just considered is entered into a computer which graphs them. Various possibilities are tested by the programme used, and it turns out that there is a simple linear relation. The data has been organised in a given way and relations between the data points have been established. This too amounts to metadata, but it is structural in that it is about the form, organisation and inter-relations between the data. It is, I think, best to distinguish this kind of metadata from the ‘simple’ kind discussed in the previous paragraph by characterising it as essentially relational: to generate structural metadata one needs more that one metadatum in order to identify relationships.


Now consider another kind of data and metadata – Prime Minister, pay attention! - suppose you recorded all your telephone conversations for a month; the recordings constitute another kind of data. If the conversations had not been recorded, then there would, evidently, be no record of them - they would literally be gone, unless you or the person you were speaking to could remember them word for word. Continuing with the assumption that the conversations are recorded, when you got your phone bill, then each of them would have a corresponding record on the bill, assuming it to be fully itemised. But the data on the bill would not be a record of the conversations, but of matters such as their duration, date, the number (and hence the person or institution) called, how much the call cost and so forth. Thus the records on the bill are metadata, they are data about data, what I have called simple metadata. If the records of the conversations were sorted and stored in a certain way, for instance if all the calls to the same number grouped together, listed by date and duration and any patterns noted, then that information, about the relationships between the data, would amount to structural metadata.


Notice that even if the conversations were not recorded, the bill would still record the same information, about the duration of the call, etc., What this example shows is that metadata can exist when the matters they are about are lost; the metadata is the only record that remains. Notice how this might make a difference. Suppose you believed that there were mistakes on your previous bills and you therefore recorded your conversations. Those records, those data, could then be checked against the metadata on the bill, which, at least in theory, could be amended in your favour. If the recording were lost, then you would have no direct evidence that your bill is mistaken. This is an important point for all forms of electronic intercourse. 



Consider internet browsing; it also leaves metadata. You visit a certain webpage, such as this one for example, for a given amount of time looking at certain content, such as this blog or the pages on weapons research. The content of the page is, as before, not recorded in the metadata, but not only will your visit leave a record of the website, there will also be a record of the pages and even of each individual blog. If you stumble across the site by mistake, there will still be a record. Now in this kind of case, the metadata and the data will amount to the same information, provided that the webpage accessed remains in existence. Granted that internet browsing metadata, stored by ISPs, record who (which IP address) browses which websites, then the content can be retrieved, provided it still exists. So when it comes to internet browsing, it is not true that metadata is like “the address on the envelope. Moreover, the internet browsing data can be structured and classified in various ways. One such way would be to collect all the IP addresses which access a given website and try to infer further relationships between the users, and so structural metadata could in this way define networks. The Australian Government has not, however, made it clear just which metadata it will try to collect. In my next post on the topic, I will look at what has been said about this issue, and then turn to the moral implications of the proposed legislation.

Write a comment

Comments: 0