Big data: promoting development and safeguarding privacy
23 October 2013 - A Workshop on in Bali,Indonesia
This text is being provided in a rough draft format. Communication Access Realtime Translation (CART) is provided in order to facilitate communication accessibility and may not be a totally verbatim record of the proceedings.
>> MODERATOR: Hello? Hello. Does this work? Hello?
Shall we perhaps get started? We are still trying to sort out remote participation. The electronics of it, I'm not sure it is sorted out yet but I propose we start and we test it as we go along.
Welcome, everyone. Good morning and welcome everyone to workshop 203. This is on big data, promoting development and safeguarding privacy.
We have a great set of panelists here that I will introduce in a moment. The workshop has been prepared in cooperation between the council of Europe and the OECD. I stepped in for Sophie Kwasny who participated in the preparation, and who should have been your moderator, so if everything goes well, the merit is for the OECD colleagues and for Sophie Kwasny. If everything goes foul, you can put it on me.
There was a background paper, an excellent paper, better policies for better lives that was prepared by the OECD and published on line for you. I will not go into it, but it's certainly recommended reading.
We have some panelists up here, and we have two remote panelists as well that I hope we will be able to introduce and hear and see probably at a distance. So let me go directly to introducing the panelists. I will say who they are and where they come from very, very briefly. If they want to add something in respect of themselves or their organizations, they will be able to do it later on.
I give them in this order. There's no particular order to this. We have Alexandrine Pirlot de Corbion who comes from privacy with Privacy International. She's dealing with research and promoting privacy issues in a range of countries and continents, in Asia, Africa and Latin America and so on.
We have Jochai Ben‑Avie from Access, an organization that promotes access to the Internet and human rights on line, especially in situations where there are particular difficulties in respect of that. He's policy director at Access.
We have Bill Woodcock, who is from Packet Clearing House. I think we will need to hear a bit more about that. But Bill knows about the pipelines and how they work, and what does that mean and how they can be monitored and collected and so on. So I think that he will be a particularly good asset for us in this workshop.
We also have Marie George. She is a council of Europe expert. She's very knowledgeable of both national and international legal frameworks in respect of privacy and data protection and including on line. She has a very rich hands‑on experience, because she previously was with the data protection authority in France.
And then we have two remote participants. We have Robert Kirkpatrick from UN Global Pulse, which is a laboratory, an experiment set up by the UN secretary general in order to explore use of big data for development and for positive use in a UN development context.
And we have Christian Reimsbach from the OECD, who is one of the co‑organizers of this workshop. We also have in the room Verena Weber from the OECD and she will be a contributor if necessary later on.
Now, when I was thinking about how to introduce this workshop, I was not sure what to say. We have different elements here. I remember that Commissioner Kroes, commissioner for digital agenda in the European Union, sometime back said that our government in Europe, in the then 27, now 28 are sitting on a data gold mine worth tens of billions of Euros. We have some idea of what that means but we're not too sure.
If we see privacy on line in the traditional sense, from the user perspective, very often what we see is the situation of a little box that has to be clicked, and the best you can see when looking at it is, well, if you decide to use this and click here, you are authorizing us to use your data for commercial or marketing purposes, and we may share this data with our commercial partners.
And you as the use ‑‑ participants in Stockholm a couple of years ago, described it, they said, well, you click yes and hope for the best. So that is one approach. But I thought that is not what you are going to be discussing and I thought how can we turn this around and see what does it really mean. What does it really mean in the future in terms of importance, of risk, of advantages as well, and so on, and I came across very recently some information about climate change. The way we see very often climate change is described as Kyoto and things like that and take up and whether or not there is a tradeoff between climate and the economy and the risks to development if we don't or if we do and so on.
But I came across a different presentation of that. And I thought, well, what would the user say if we present things differently? What I saw is the question of climate change is killing 300,000 people every year. By the year 2030, apparently, research says, that there will be no more ice on the Arctic Circle. That's a different way of seeing it. I saw a tweet a couple of days ago that said that by 2054, two and a half billion people will die as a result of climate change. I thought, well, can we turn that around? Can we say something about big data to introduce the session in those terms, to really tell the user, tell people out there, what does it really mean for them? Two and a half billion deaths by 2054. Can we say that kind of thing in respect of big data? And I thought, well, maybe some of the information that we have been confronted with in recent times in respect of big data suggests that if you use this service for free, we will use your data, created with other data sets out there, and we will be able to shape your consumption user. Maybe in the meantime, we'll be able to shape your health choices. Maybe later on, we will be able to shape your and your community's political decisions. Is that a different way of presenting it? Is it a realistic way of presenting it?
On a tweet, it could sound like this. Click I agree, and your life will be in our algorithms.
Now, I think it's time to move to our panelists. Of course what I said is it's a big exaggeration. But I would first of all ask Marie George, you are concerned by big data. Could you tell us why? Could you tell us just in a few sentences what is it for you?
>> MARIE GEORGE: Hello. Thank you for inviting me.
Big data. What is it in reality? Enterprises, governments, collects data. You know that. You give them the data. You have some obligation and so forth.
Now, we have been in this situation for a long time. So it means that you have in the hands of others, many data about you. Many, many, many. And some people, scientists and everything, would like to use them and to say, well, we can make some knowledge from that. Making data speaking themselves, data mining and so forth.
And so some ‑‑ I am surprised because here, since yesterday, I heard twice, an IT enterprise saying we should not look so much at the collection of data but maybe ‑‑ but more to use of the data. What could be the use ‑‑ innovative use of big data? Well, we will see.
I think there are three areas in which we have to think about this relation between personal data collected and maybe by different ways. You are in connection with your bank by telephone, by email, by paper you sign for contracts sometimes and so forth. All this going on for years. So we have to look at big data in my view in three things.
About purpose. I go back to the very classical, I'm sorry, but sometimes classical things are very useful. Big data for which purpose? So you can make calculations on big data about what? About how the enterprise is running. It concerns employees. You can do better things with that, and to maximize what? What are you going to maximize through those calculus?
It can be in relation with the client, with the patient. I will further on give examples. And also about research under society. And I think we will have to discuss these different things, because the implication are different, and we will see further on if there are solutions. How to ‑‑ what kind of things we can have in our head to look at precise big data operation. Is that okay?
>> MODERATOR: That's a very good beginning. Thank you.
Can I move to Bill, and ask him whether the location of data, the transit of data, the processes of data out there in the open, they have a bearing on all this and on the issues that we are discussing.
>> BILL WOODCOCK: Yes, certainly.
I think we're seeing a lot just over the course of the last few months about, you know, the NSA, wiretapping, and so forth, but that's of course just the tip of the iceberg, and I think the fact that no European countries are leaping forward to speak out against that indicates perhaps that the NSA is not the only organization doing this or the U.S. the only country doing this.
So I think ultimately what we need to be looking at is default operations. What it is that gets done by default with the actions that users take, whether those actions are processed and dealt with and the user's request is fulfilled and that's the end of it or by default whether everything is recorded and logged.
People responsible for making things work will almost always, if they have the option, log everything by default so when things don't work they can figure out why and fix it. And that seems like a perfectly reasonable approach to things. It makes things work better. It makes things ultimately more reliable. It gives the user a better experience, and there's a very compelling set of reasons for taking that default. The problem is that disk is really cheap and attention is really expensive and so once you're recording everything, the easiest thing to do when you run out of disk space is buy another disk and that means that all of that old data is still there. You could alternatively take a bunch of time and attention and try to anonymize the old data, scrub it so that you still have something useful there if you need to go back and debug a problem but it wouldn't expose anything about a person. But the problem with that is we've seen in study after study that anonymization is effectively impossible. The basic problem with big data is you correlate one source of data with another source of data with another source of data and pretty soon you can create a very three‑dimensional picture of a person or an activity or a place without there ever having been any one omniscient point of view. So having these old troves of accidental data sitting around is a frighteningly appealing target.
You know, data also can be copied at no cost, right? Taking data that one person has and replicating it makes a copy of the data and doesn't take it away from the first person. Therefore, stealing data that somebody has is not necessarily an obvious thing to the person from whom it's stolen. It doesn't necessarily disadvantage them. If they're not disadvantaged, then they aren't necessarily going to admit that it happened or try to get it prosecuted or try to even necessarily make it difficult.
Therefore, if these troves of user data are sitting around, people can steal them and correlate them with other sources of data, and nobody who collected the data even is necessarily going to be aware that that happened or have any reason to try and make that difficult.
So I think the technological side of this has to do with, first of all, defaulting to collecting data that doesn't need to be collected. Secondly, storing it when it doesn't need to be kept, and thirdly, shipping it around to somewhere that may happen to be convenient but passes a lot of eyes in the process.
>> MODERATOR: Well, that gives some writers some concern, and we seem to be seeing only the tip of the iceberg. So I think that policy makers thinking ahead should try to see under the surface as well. Thank you very much for that.
I would suggest that we move, if our remote participation is working, that we move to Robert Kirkpatrick to tell us a bit also about the positive side, if the remote participation is working, if he could connect. Is that possible? Robert, can you hear us? Can you speak to us? Robert? Hello? Do we have a connection?
We don't seem to have a connection. I would have wanted Robert to tell us about the positive use that the possibilities that big data offers for development.
>> ROBERT KIRKPATRICK. I am here. Testing, testing testing. 1, 2, 3, 4. Can you hear me?
>> MODERATOR: We hear you loud and clear. Please go ahead.
>> ROBERT KIRKPATRICK: Okay. Very good. Thank you.
Well, good morning, everyone. So I just wanted to give you a very quick snapshot of Global Pulse and our view on big data within the secretary general's office. Global Pulse is an initiative that was created out of the global financial crisis back in 2009 and as noted by our moderator, this is essentially a lab for the UN system to learn how to take advantage of all this data that's out there for development and humanitarian action. We're based in New York, our headquarters. We launched full‑fledged in Djakarta in Indonesia last year, in October, and we're in the final stages of preparing to launch our second post in partnership with the government of Uganda, in Kampala.
Essentially we find ourselves today living in a hyperconnected world where our need for speed has never been greater because of the pace of change that's accelerating around us. The irony is that increasingly we're noting that ‑‑
(Lost connection to the remote participant.)
>> MODERATOR: We seem to have lost you. Can we get him back? Is it possible? Huh? He disconnected? Can we come back to you a little later? Perhaps can we connect with our other remote participant, with Christian? He was going to ‑‑ I hoped he would tell us about also the positive use, the value of big data for society, but also for the economy and for development. Can we have either of them?
>> CHRISTIAN REIMSBACH: Can you hear me?
>> MODERATOR: Yes, we can. Welcome.
>> CHRISTIAN REIMSBACH: Thank you. Yes, so my name is Christian Reimsbach and I work for the OECD, particularly on the project called new sources of growth, knowledge based capital, which looks at data as a sort for growth and as a source for innovation, and it's a project that is involving different part of the OECD, including our colleagues working on health, including our colleagues working on science and research, as well as our colleagues working on public governance and the idea is to look at how data and the use of little data and big data can promote innovation in those different fields, in the area of health, in the area of public governance, and in the area of science and research. So we see, we are looking at this new potential, and ‑‑ but at the same time we are also looking at the risk that comes with that, and obviously privacy is an important one. Issues related to consumer protection as well, and we are also looking at issues related to skills and employment, because we believe that there is some challenges related to the use of big data related to skills, and not only do we need people that have the skills to treat and analyze big data, but we also have to think about what the implication of big data can be on employment, and later in my intervention, I would like to make a point and focus on open data because this is an area where we believe there is a lot of potential, particularly in the context of development. Here we are talking about open data, not only from the public sector, so open government data, but we are also talking about aspects of public sector and private sector data. So how can we use data from the private sector to promote development.
Maybe I would like to stop here and focus on this during the discussion later on.
>> MODERATOR: Excellent and thank you very much and do stay with us so we can come back to these matters.
Do we have the possibility to get back to Robert so he can complete his initial statement?
>> ROBERT KIRKPATRICK: I am dialed in. Can you hear me?
>> MODERATOR: Yes, welcome back.
>> ROBERT KIRKPATRICK: Very good. I'll do my very best not to disappear again.
So at any rate what I was saying is while everyone around us seems to be struggling to make policy decisions with statistics that are many years out of date and the high level panel for 2015 has called for a data revolution, in a sense the revolution has already happened. There's all of this data out there. People producing it by going about their daily lives, search, transactions, communications, money transfers, borrowing and repayments. All of these happen over mobile phones via SNF in developing countries. So we see a tremendous opportunity to adapt to these innovations to the fight against hunger, poverty and disease, to produce a new evidence base for impact, new approaches to governance, ways to empower community and hopefully increase the effectiveness and efficiency of development.
Our hypothesis is pretty straightforward. People are using digital services all around the world to meet their basic needs at the household level, and when their needs change, they change how they use these services in ways that we can learn through analysis to recognize. What patterns are left in data when people lose their jobs, when they get sick, when they begin to struggle for food and medicine and to meet basic necessities.
The classic example where a lot of this started back in 2007 was with Google search. I don't know how many of the audience have ever used Dr. Google. It's the first thing people do when they or a family member gets sick is they search for information on line about their symptoms. This has been shown to predict the outbreaks of diseases like dengue.
Twitter. We're working out of Djakarta. Djakarta produces more tweets per day than any city on earth. And when you filter out the celebrity and sports chatter, what you find underneath is a lot of content where people are talking about the unaffordability of food or fuel, job loss, symptoms of diseases.
And finally I would point to data from mobile phones. Mobile carriers can see the population of a country moving around in real time on the map. Now, all across the UN, we have maps of poverty, we have maps of disease outbreaks, crop yields, but we can't see the people. But a mobile carrier, as people carry their devices around and communicate we can see where people are moving and this turns out to be very valuable because you can see, for example, the daily commute to work and when it stops. You can see the patterns of migration, where people move after disasters, find ways to optimize your transport network because you know where the traffic jams are. Or model the spread of malaria. There's a lot of potential here.
We think big data is the greatest opportunity to present itself to global development in many, many years, unless you fail to protect privacy in the process, in which case this may be the greatest threat to human rights the world has ever known.
And I fully agree with Bill. Anonymization is really, really, really hard and the research is constantly suggesting that it may be more than really hard. It may ultimately be impossible.
You know, we see an opportunity here to develop a framework for using this information for good in a way that protects privacy, and in a way that we hope could contribute to a change in the public conversation around big data, because today it's incredibly polarized. On one end you have regulators concerned about that reuse could represent potential misuse, and on the other end you have companies pushing the envelope to do everything they can with this data.
We see big data as a role of public good, but we know you need to learn by building and by testing and by experimenting how to address a specific challenge while protecting privacy.
Another piece I'll mention is that a lot of this data is locked up behind the firewalls of corporations, not the stuff that's on line, like Twitter and a lot of social media and on‑line news, but communication patterns, interactions, transactions. This kind of information, companies are using to compete, and so we've been engaging with them in an idea we call data philanthropy, helping them to work with us to understand what data they can share in a way that protects both privacy and their own business interests, yet still could give us the digital smoke signals, those real time indicators that something is happening in a part of the world that we need to understand better.
Just to summarize, we function as a service to the UN system and member states. And we do joint R&D projects, leveraging partnerships with private sector and academia who have the data, the technology and the expertise in analysis and privacy.
So thanks very much.
>> MODERATOR: Thank you. Thank you very much, Robert. And do stay with us, because I hope that the discussion will evolve further. But the message was clear. There is a lot of good out there in big data, but it will work or not depending on whether we master it, and whether we manage to protect privacy.
Now, I would like to turn to Jochai and to Alex now to tell us more about the ‑‑ maybe the risk areas that we may be confronted with, and whether they are satisfied with what they are hearing. There is a lot of good that can be done, but ...
Perhaps Jochai first.
>> JOCHAI BEN‑AVIE: Sure. Happy to.
My name is Jochai Ben‑Avie. I'm the policy director at Access. We're an international NGO that defends and extends the digital rights of users at risk around the world. So there's the growing dependence on technology to connect us to conduct business, for development, for government services. It's exponentially increasing the amount of data being collected about us. As Bill noted, the cost of storing and mining user data has fallen precipitously.
Big data in governments is used to protect food and disease trends, improve education, track political movements, all the good things that Rob and his team and other folks are working on. Big data can also get really sort of scary and very deep violations of user privacy very quickly. Earlier Jan talked about this notion of personalization. Big data is also great to give you more web experience that's closer to what you really want and gets you the content and the stuff and the things that you like.
Let me tell you a story. There was a girl, she was in high school, and she had been probably given consent to perhaps be using Target's website. Target is a big box store in the United States. And she sort of is looking at different things, browsing products, and Target then analyzes those data that were pulled together from her traffic and starts sending her coupons, you know, to her family's house. And her dad walks into the school with one of these coupons, outraged. He says what the hell is this. It's a coupon for diapers addressed to his daughter for baby formula. And Target had been able to figure out that his daughter was pregnant before she said anything to him, and this is a high school student. This is a real story.
Many advertising companies boast big data generating profits creates this consumer experience for us, but I think we also have to remember that on the other side of that, you have the creepy violations on privacy but you also have discrimination, all right? That these companies learn a lot about us and with all this information are capable of making very important decisions on our behalf, like determining our credit rating or insurance rates or even eligibility for a particular job.
Let me tell you another story. Let's say you use a music sharing service like Spotify or Groove Shark, and they might logically assume that that information will be used to recommend music to you, right? That's what you're doing. You're okay with those algorithms. But that information can be used to guess at your racial background, your class. It could be used for other purposes to deny you a loan, and sometimes the companies are wrong too and even if they are right, is it any more acceptable? And where do we draw that line between personalization and discrimination?
I know we're running short on time here.
The other point I want to make is corporate collected data is also the fuel of the surveillance machine, right? As this summer's revelations made us all too aware, instead of conducting targeted surveillance along the lines of criminal law enforcement generally that looks at due process principles, governments are simply collecting the haystack, and so user privacy in an age of big data, we need to be looking at the data that's collected by corporations. As Bill was sort of saying, a lot of stuff probably should just not be collected or that we need to have very strict limits on for what purposes it can be shared and so forth.
The European parliament's levy committee voted on the data protection regulation in Europe yesterday. The DPR has a lot to like in it. It provides provisions for explicit consent, for privacy by design and by default, the ability of data protection authorities to levy pretty significant fines.
The law contains huge gaping holes as well. Companies may engage in profiling as long as that data is pseudononymous, which as others on the panel have said is an increasingly meaningless term today. And two, that the DPR allows companies to process your data without your consent if it is within their legitimate interests, which is a super vaguely defined legal term that gives permission to data controllers to share your information with third parties.
So whether it's discrimination based on our web activity and habits or governments tracking our every move and looking at everyone we've ever known, these violations of human rights from big data comes down to sort of data collected by companies, and so we need strong safeguards, and we need them now.
>> MODERATOR: Thank you very much. That's a clear and compelling message as well.
Can we turn to Alexandrine then and see what her take is on this and especially perhaps if there are particular groups in society that would be vulnerable to the use or misuse of data or big data in that respect.
>> ALEXANDRINE PIRLOT DE CORBION: Thank you for inviting us on this panel.
Just a quick word about Privacy International, for those who don't know us. So we're the first organization to contain at the international level specifically on privacy issues and we work on an array of different issues and with different professionals to investigate and advocate for strong national, regional, and international safeguards of the right to privacy and personal data.
I mean what's been said so far in terms of the positive side of big data, we don't contest it. It can have a positive impact, and it has been shown, especially in the developing world, in terms of accessing education, healthcare, the delivery of aid, particularly in post conflict or conflict situations, so that's been an amazing progress. But I share the same concerns that were raised with Access and specifically in terms of, you know, the context in developing countries. We have to look beyond, you know, the economic and the social development. There's also the human security elements, and with the use of big data, we're losing that part of development because we're challenging, we're putting at risk the human rights of individuals that we're supposed to be helping. So that's something that we feel that is not taken into account when developing development programs. Is there really the impact on privacy also linked to human rights, like the freedom of expression, of association and movement.
So one of the things that's really concerning and it's linked not just to the developing world but to big data in general is its discriminatory and exclusionary nature of big data. So what I mean by that is the data collected is from people that are active, you know, on the Internet, that take part in Facebook, who buy on line, who maybe have a mobile, so all of these data is brought together, but it excludes the ones that don't take part in these activities, whose behavior, decisions, and needs are completely excluded from decision‑making processes who use these big data programs.
And for example, like taking the example of Africa, I mean in some countries, there's less than ten percent of the population that's connected and using the Internet. So what does that mean about the decisions and the policies that are developed based on the data that's collected on line?
Another element we wanted to bring forward, and it's really important specifically in developing countries with sensitive political, but socioeconomic contexts as well is the potential for surveillance. So this big data is generated, and there's a possibility to draw conclusions and develop patterns of behaviors and profiles, and the aggregation of this data means that certain elements of somebody's identity can be revealed even though the individual had not consented to this data being given.
So for example, it's not because you agree to share data about ‑‑ that you have a Facebook account, that you are on line and that you have a mobile. From this data it's possible to identify maybe what ethnic group you're from, what religion, what is your sexual orientation, and in certain continents, the ability to identify these really intimate criteria about an individual and what makes them their identity can have really in some cases tragic consequences.
So these are all elements of concerns that we want to trace. There are many more and I'm sure they'll come up in the conversation.
>> MODERATOR: Thank you very much. You've introduced a few additional elements of risk. Not only we were talking about privacy and so on, but now we have a linkage to freedom of expression, association, even the right to freedom of movement through monitoring of roaming mobile devices and so on and so forth.
We have not only a big brother situation, but a set of big siblings out there that are looking into our data and collecting and processing and monitoring our conduct.
I would like to open the discussion to the rest of the participants in the room, but before I do that, I would want to turn to Marie again and see whether she would say, okay, we've heard good and bad. Are your concerns mitigated? Are they increased by what you have been hearing? And do we have any solutions? Can we look towards the future with some hope?
>> MARIE GEORGE: Yes, we have some solutions, but maybe we only need some more that nobody for the moment have seen now. Of course when we talk, and I hope our friend from the UN is here. I recall that UN adopted in '90, unanimously at the general assembly, guidelines on the protection? And we show how it can be used.
OECD also has guidelines, so before saying that we have nothing, we should look at what we have.
I was very interested by what our colleague here said first, default, our need to collect data. I'm sorry that is personal what I said here, that we don't have the technology that we need socially. You cannot do anything with Internet or with your telephone without leaving traces. This is not normal. In the real life, when you meet people, after a while you ask who are you, after a while maybe what is your telephone number, before asking even where do you live. You see this kind of thing, which mean that in the real life, we need every tools for meeting people without being ‑‑ revealing who we are. When you search on a search engine information, it's like being in the library. You can look at everything. Nobody has to know what you are looking for. In the real world for the moment, it is not the case. And we will go back to that, because it is very important in relation to the right of privacy, information also.
So you cannot be anonymous when it is needed and the freedom that we all knew.
And then we need different kind of identification when it is needed. When you have to pay, why do you have to say who you are on your card number. We don't have universal coins on Internet. In the life you can pay anonymously. You don't need to say who you are. You pay. That's all.
Okay. You understood.
So I don't know when the IT community will help us in that sense, but we will ‑‑ we shouldn't need all these different devices.
An American living in the Netherlands, about 10 or 15 years ago, he told me we should send our requests to everyone so nobody knows to whom I am talking to because everyone gets it. But only one will get it, because it will be encrypted, and so the other one that you are looking for will know who you are. That's a way to be anonymous but it's not enough. But this is impossible for the moment. We don't have the wires and everything for that. So we need that.
So after that, big data default. I completely agree. On Google we should be able to look for something without leaving any trace on the record. We should send no data collected.
Now, big data regarding individual decisions. Big data means data mining, profiling. You will look at the convention 108 modernized, the directive in Europe in 95 and the French law on 78 already.
Profiling. There is a recommendation from the council of Europe from 2010 on profiling. Very interesting. So when people are looking to take a decision about you on the profiling. Of course it has to be forbidden in justice decision. Only the French law has that. No one international privacy business says that it is forbidden to use profiling techniques to decide the amount of penalty. It is not set anywhere. It should be.
Okay. In the profiling. The problem is that no one of us is according to a profile. And what did they detect in the profile? They found in big data. Only what they had in. Maybe you are something else with some other information that you could give and which could lead to another decision. So in profiling, you need to know on what basis is taken the first decision. Now there is safeguards in Europe. You can always contest. You can contest the decision, but you have to move.
Okay. And certainly in some countries, any profile is submitted to the authorization of the DPA, and I can give you examples where there was no authorization. In a bank score, scoring, to know how much we are going to give you for a loan. They found in banks that when there is a huge difference of age between a husband and wife, there is problems on the account, the more it is. And the DPA refused to take this in account, because you have the right to marry anyone, and this should not have any economic consequence.
You can think of many situation in which ‑‑ I mean with this example, if you look at other profile that are done, it will give you the idea of how to think about. Also now with genetics and with huge data files, you can even predict what a person may have as a disease. What do you do if you don't know how to treat that disease? Do you tell the person or not?
There have been an experience for making ‑‑ to know what is the DNA. So samples of a patient had been taken from doctors, and at the beginning, and they show that some people could have, and they were asking does the doctor of this patient has to tell him. One DPA decided no. If you don't know how to treat someone, why should you give it ‑‑ the information if it is not required, of course. But think of that.
Yes, it's true that we may be in a changing world. If we are going to tell each of us with all the prediction that all the big data is going to give us. We have to be careful.
Now, on research ‑‑
>> MODERATOR: Marie, can we move on? Because I would like to be able to have the opportunity to interact also with the rest of the participants. You will have the opportunity to come back.
I have the impression that we have been assuming a lot. We assume that in the past we benefited from privacy that we don't have anymore. I would advance that privacy never existed. That the moment we go out into the street, privacy doesn't exist. We are seen by people. What has changed is the latency of the information in respect of that. How long does it stay recorded somewhere in someone's brain? It has changed in respect of the ability to retain it, to collect it, to aggregate it, and to process it, but in reality, privacy very seldom existed or existed in very marginal and restricted areas, even in respect of payment. The moment you pay cash, it doesn't mean you are anonymous. It means that the transaction is recorded in someone's mind for less time. So privacy in that respect never existed either. It was the difficulty to collate and to correlate that information with other data sets if you want has changed.
Can I turn to the rest of the participants? What is your take? What are your questions? What are your observations in respect of what we have been discussing?
There are a number of requests from the floor. Do we have a roaming mic? Please can you give it to the person closest to you, and perhaps we can take another microphone to the other side so that we can have a quick switch afterwards?
>> AUDIENCE MEMBER: May I use this?
Thank you very much. It's a very interesting discussion, primarily about private sector use of data. But I think that the big elephant in the room at IGF this year may very well be the revelations about the U.S. National Security Agency and the amount of data it has not only collected but allegedly used. So I'd like to get the panel's thoughts on that.
And just as a disclosure, I'm an American journalist and will be writing this for an American newspaper so what you say may be on the record, but it's on the record anyway. But thank you.
And if you could identify yourself, I would appreciate it.
>> MODERATOR: Thank you very much.
Can we collect more comments first, and then we will have a tour of responses?
Someone said from the panel earlier that also the corporate collection of data feeds into the surveillance machinery, so we have that idea as well, the interlinkage between the two.
Please go ahead.
>> AUDIENCE MEMBER: So big ‑‑ I'm sorry, John Laprise from Northwestern University.
So big data is just noise without the statistical tools to analyze it. And I've heard very little discussion from the panel about the availability of such tools, the complexity and sophistication of the tools, which is not evenly distributed. Obviously in line with the previous speaker it's apparent that intelligence agencies have better statistical tools for analyzing big data than perhaps corporations do. Could I have the panel's input on sort of the other half of the big data question, which is not just the raw material, but the tools necessary and the skills necessary to use those tools? Thank you.
>> MODERATOR: Could we move the microphone to the other side, please? Yes. And if we could bring one forward, there's a number of hands up on this side. If we could make sure that we can switch quickly to them. Please go ahead.
>> AUDIENCE MEMBER: Yeah. Just a response to the assertion that privacy never existed. Just I think that's a big failure to distinguish between the public and private spheres, and privacy is the right to reveal selective information about oneself. So I just maybe want to put a hypothetical to you all. Before the existence of the Internet when you walked through the street, would you have unwittingly revealed your sexual preferences, your health, how you like to have sex behind closed doors? So by going out into public, you don't lose the right to selectively reveal information about yourself. Thank you.
>> MODERATOR: That's another aspect. The right to present oneself, to show an image of oneself, to construct an image of oneself that is presented publicly.
>> AUDIENCE MEMBER: I have a question to Christian Reimsbach of the OECD but also to the other panel members and that is a question on how do we actually encourage companies to release data into the ‑‑ for the greater common good? I'm thinking particularly for instance security companies for climate change data or other data that has not only corporate value but has actually more value if it is released in the public domain. How do we encourage the companies?
>> MODERATOR: Very good. Thank you very much. Somebody else has the microphone.
>> AUDIENCE MEMBER: Yes. Valentina from Bosnia. I'm interested about the frame, because all those data are collected massively, and whenever you collect something, there is an assumption and I'm wondering who is framing the assumption, and I think that we still, with big data, we have the risk also to mainstream a very status quo of the society. We never mainstream the diversity, the alternative, because there are also minorities. So for me it's also important that big data is a big strength and one we can change, so who is failing? And algorithms can be very sexist. And we see when you Google about women, it's not very progressive. It's just giving you the images that women has to be, which always we have been. Thank you.
>> MODERATOR: Thank you very much. That's another interesting angle.
Someone else has the microphone up in front.
>> AUDIENCE MEMBER: I'm Bastiaan from the Netherlands and I heard somebody say it should be possible to search on Google without giving up your privacy. But the thing is there were 50,000 people at Google who make possible the things we are using, like Gmail, Google search, and they need to get paid in order to keep ‑‑ in order for us to keep using that services. When Google has no money, we won't be able to use that services. And they earn money by using our privacy. So we should either give up our privacy or not use Google. It is impossible to use them both. And I think that's a misunderstanding.
>> MODERATOR: Thank you very much. That's an interesting point.
Can we move to this side so that we don't only hear one side of the room. I'm sure there are no sides here, but at least in terms of location. There is a lot of interest. I would like to take a couple of more comments so that we can do a quick round on the table and with our remote participants and then go back to you again. Please go ahead.
>> AUDIENCE MEMBER: Thank you very much. Thank you for the great panel. I would like to extend the question raised from the lady from Hong Kong. The information that is collected by companies can be some of it released for the better good, but in many countries, the most of it is used by government agencies and ministries. The ministry of transportation and ministry of finance and ministry of health. Lots of information is already available to these agencies and the public does not have any control about what information is released and to whom and what use is it used for. So I think it's tempting that the paradigm, extending the inputs to the public sector, to the government is also very important for the information they collect. Thank you.
>> MODERATOR: Thank you. Someone else had the microphone on this side. Please.
>> AUDIENCE MEMBER: I think there is a focus on the use of big data between governments and companies, but now also academia is drawn into the subject as well. There has been a lot of programs from academic institutions that try to analyze big data that is produced by activists on line in different parts of the world where there are crises happening to collect and analyze a narrative on the political situation in that country without really applying the rules of research, of humanity research and using the data of the subjects too. Especially these tools of analysis and algorithms that are used to analyze the political situations are done in closed rooms. So I was wondering what are the panelists' opinions on the use of the practices of academia to analyze different usages or different activism on line to produce political narratives.
>> MODERATOR: Thank you very much.
There's someone who has been raising their hand over there in the middle. Please could you pass to this person? Yes, thank you very much.
>> AUDIENCE MEMBER: Hi. Thank you. I'm Linnet Taylor from Oxford University.
I have a question about anonymization which I think is the other elephant in the room here, because those of us in research know that, and I think that everybody in the big data science sphere knows that anonymization is not actually fully possible in the age of big data and another data set will always come along that could be linked or merged in the future. So rules for ethical research, which the previous commentator was mentioning, now include the requirement that you predict the business of future data sets in technology which might deanonymize your research participants. This is not possible. What do the panelists think about that?
>> MODERATOR: Thank you very much.
We have a number of issues that have been raised. I won't give the floor anymore to the participants on that side. I would like some reactions here. We have had remarks in respect of security agencies, of uneven distribution of the tools, and the capabilities between different entities. The question of privately held data passing to public hands for the public good as well. Assumptions in respect of society that can model future and perhaps the need to move away from those assumptions and allow society to evolve in its own way. The way we pay for services which appear to be free, and which in terms of staff and resources are very costly out there. Do we have different ways of doing that? There is the question of supervision. It came through in different ways. And there is the question of political developments and activism and crisis management, which can be assisted by big data. And the question again related to supervision, the question of ethical values, how do we introduce an ethical dimension into the use of big data, and is it enough to have ethical values out there that are agreed to or is it necessary to be able to influence, to monitor, to supervise the way it is handled.
I turn to the panel. I will also ask the remote participants to intervene, and I would like them to pick what question they would like to answer very quickly, perhaps in tweet form, to your various remarks. We can take it along the table.
>> ALEXANDRINE PIRLOT DE CORBION: Yeah, I'll respond to a few of the issues raised, the first one regarding the Snowden revelations. It's true that these revelations have cast light on how much data is being collected, and without a purpose, which as Marie George mentioned, it's the basics of the regulations, like what does the data that you're going to collect do, what is your objective and you need to justify that to the data owner when you collect their data. And the way that big data is being developed, there is no way at every step of the process to get consent.
And also what has been revealed about ‑‑ with the Snowden is really the idea that this data will be there in case it can be used one day, and that's really dangerous in protecting the right to privacy of individuals, because it's not using it in a specific point in time to develop a policy or for law enforcement. It's in case one day something happens that you might have to resort to using that information, and that's again linked to the issues of consent.
I'm a bit worried about the question about how to encourage companies to reveal the data, because the point is for the private companies to have privacy policies to protect their customers, so I wouldn't advocate for companies to just reveal all the information they have about their customers. This has to be done in a manner in which the data owners understand what their data is going to be used and it shouldn't just be putting all the information out there to be used even if it's for a social and, you know, development purposes. Because at the end of the day, you're violating the privacy of these individuals, even if the end goal is positive. So I think that's something we need to take into account as well in the short term but also in the long term.
So I think I'll stick to that for now and then pass on.
>> MODERATOR: Thank you very much.
>> MARIE GEORGE: On the Snowden business, in my view, if it continues, if the orders does not stop after a while, people will be very afraid to use any kind of IT. In the '70s, when the first laws came up on data protection, IBM made ‑‑ IBM was the Google of today at that time. Made an international study and so that if it did not support laws for better protection, it will be a reject of IT.
Today in my view, we are the limit. And I hope the U.S. will give good news after what happened, because it is not possible. It is against the international law. You cannot spy on another state. It's a question of sovereignty. And here it was not for terrorism. It was for economic spying. I could talk about that a long time. Political and economic spying. So this is completely against and even with the lies, which is terrible. So we are in a war, an economic war.
In that context, what about big data and our safeguards? First as I said, the more you have that type, the less it is possible to anonymize. Everybody says, so? Still in Europe we think it is possible. I saw yesterday the compromise, they talk about ‑‑ as it was a safeguard. For another purpose and the one for which you gave your data, if it's for research and all what we said, if it's for another purpose, you have to give your consent, clear, and not one consent for many things. One consent. You may not like your data go to secondary research. That is your grandfather or your uncle. So this should be possible.
Secondly, when we are talking about research, about private or public data, today, and as I said, the more you have that on an individual, the more they are sensitive. For the moment we still have categories of sensitive data. Tomorrow in my view, all the records you have will be all ‑‑ has to be treated as sensitive data, which means in the area of sensitive data, like health, what do you have? You have in many countries, and I think in maybe all now, an independent body which looks at which research the people want to do. Easy to confirm with public interest or not. I'm very afraid of private studies, not published, sold by big enterprise which have millions of individuals' data that will ‑‑ could make any kind of data or any kind of research even against the public interest. This is not possible. It's for money. That could really happen. So we have to invent in this area some kind of procedure to be sure that the studies which are made are on the public interest. Maybe everybody doesn't know how to do that here. Look at what is done in the health sector, for instance. I mean we have to see what exists already and we could say that.
>> MODERATOR: Thank you very much. If we can move on so we can come back to the floor again.
BILL WOODCOCK: I'm going to try to speak very quickly because I'm going to try to address three different points here. First one of the things that you said, Marie, the need for tools to allow disclosure and anonymity. I couldn't agree more. We have very few tools in this area. There's not enough competition between tools to ensure that we have good development and progress in this area. The problem is that because there are so many people who value people's private data, who feel that they can put it together with other things in order to turn a profit, there's an economic incentive for the data to be collected, and that's what powers companies like Google, right? That's what gives them the money to hire the programmers to do the development work, whereas the tools for not collecting the data, the tools for preserving privacy, there's no economic incentive to anyone to not collect data, right? This is a default position which should exist, but for which there isn't a financial benefit to any individual company, any individual software developer, for instance.
So we have this, but we have it as a result of people who are working in the public good who are giving up their time and their energy in order to make the world a better place. And so this is something that is perfectly feasible in economic good times, when we have a pot latch economy, an economy of plenty, and people can do good works and know that they will be taken care of and they'll have food on the table tonight, despite the fact that they spent all day working for society rather than working for a profit. And the problem is that's 2008. We've just had a pretty rotten economy globally, and so the open source software development, open standards development, have both suffered during that time. And so I think, you know, we had a lot of work that happened during the dot com boom, right? There was a lot of money flowing into industry. A lot of people happy to take a speculator's money and use it to do good work and then declare bankruptcy or whatever. And then during the sort of mid 2000s, there was a slack period when the economy wasn't too bad and a lot of people were kind of sitting around waiting for something to happen and got a lot done. But in the last five years, I would say very little has occurred in this area, and it really needs to. And I think this is an area in which philanthropy could be a lot of help, and I think it's an area in which governmental spending on supporting open source software development could be a lot of help, because the private sector is just not going to do it. There isn't a profit motive there.
The second point I was going to try and make is addressing this question about the notion that there's a debt owed to Google. I think these kind of tie together, right? If you're not the customer, you're the product. If you're using Google services, then you are being sold to their actual customers. If you want to be Google's customer, there's no problem. You can go and spend money on Google ad words. They will happily serve you. If you're using search, it's not a question of you owing them something. It's a question of them selling you to their customers, right? You don't owe them for the fact that they employ software developers to write profitable tools to do advertising, right? That's just like saying you owe advertising agencies your time looking at their advertisements in a magazine, right? Yes, it's the advertisements that paid for that magazine, but you don't owe them your time staring at a picture of a bottle of Scotch, right?
The third point on how governments use big data. Just to note that I think there was a FOIA request that was just publicized in the New York Times this morning about the Transportation Safety Administration in the United States is now doing ‑‑ using big data to prescreen travelers before they even get to the airport, and they're using, in addition to the information they were already using, car registration information, employment information, tax payment information, property ownership information, and past travel itineraries, which is a lot of different data sources to correlate, and they were already doing that kind of thing for international travelers arriving in the U.S. Now they're doing it, you know, if you just want to take a 20 minute flight to the next town. And they keep trying to expand to buses and subways and things like that. So it's very hard to know where this will stop, and I think it's important to remember that ultimately we're all paying the cost of this, right? It's not just their time that's being used up in doing these kinds of correlations. Their time is being paid for with tax dollars, and that's true of all governments.
So you know, the question of what governments choose to do, it's not academic. It actually has a cost and the alternative is better healthcare, better public transportation, better schools, right? Spying on people comes at the cost of all these other kinds of public services that have some measurable benefit and I don't think anyone has done any studies yet showing big social benefits to spying.
>> MODERATOR: We are running out of time. We have still 12 minutes. I would like Jochai to react, and I would like to give the opportunity also to Robert and Christian, and then I would hope that we still have some time for the rest of you. Jochai.
>> JOCHAI BEN‑AVIE: All right. I'll try to keep it pithy. Bill took my New York Times article that I was going to say, but I think what that highlights is again that the data ‑‑ big data is being used certainly by corporations to maximize profits. But it is also again, and I repeat my line from earlier, corporate collected data is the fuel of the surveillance machine, so whether it's the TSA or NSA or other intelligence agencies, the DEA working with AT&T that collects over 30 years of data on the hemisphere project. At the end of the day we're dealing primarily with the companies collecting the data and then the government getting it from the companies, so I think that's really at the heart of this.
The other thing that we haven't really touched on is a lot of countries disambiguate these two things, right? So in the European context we have the data protection regulation which mostly deals with the sort of corporate user interaction, and then we have the data protection directive that deals mostly with what law enforcement can do with user data, and I think that we really ‑‑ given the sort of inherent connection between these two things, we shouldn't disambiguate them. We need to have comprehensive data protection. And further I would add a lot of countries don't have a data protection bill at all, and that's really something where we need to be doing more. Particularly in developing countries, there's a dearth of comprehensive data protection bills. And so that I think that we're seeing two major standards in the data protection regulation in Europe and the convention 108 modernization going on in the council of Europe as sort of models there.
Just trying to hit on a couple more points. You know, in terms of this sort of ‑‑ the what about the tools to analyze data question. I think that's exactly where the field is in this sort of ‑‑ dumb data is just a haystack, right? But there's all this work on smart data right now to try to build algorithms and tools and so forth that are really making this no longer a matter of statistical analysis but of using a pretty easy to use guide to user interface, and so it's becoming easier and easier to violate people's privacy in that way.
Moreover just raise your hand, who here has heard of a company called Axiom? So we've got like five people. So Axiom is a huge data broker and so it's not a matter of having to have the in‑house statistical capability to make use of that. You can just buy profiles and an Axiom profile is like a hundred bucks or something. So I think we need to again consider that part of the equation.