Google Open Forum
25 October 2013 - A Open Forum on in Bali,Indonesia
This text is being provided in a rough draft format. Communication Access Realtime Translation (CART) is provided in order to facilitate communication accessibility and may not be a totally verbatim record of the proceedings.
>> MEREDITH WHITTAKER: Hey everyone. We're going to start in just a second. But first point of clarity, this is -- this session is titled Google Open forum, and that's kind of an odd Internet Governance Forum naming Convention that came post hoc after we had submitted this presentation. This is a joint panel hosted by Citizen Lab and Google Research, looking at measuring Internet rights and openness. So I -- that's fascinating and I hope you are pleasantly surprised if you were here for the Google Open Forum.
We're ready to go?
Cool. So thank you everyone for being here. I am delighted to see this turn out and I'm really honoured to be able to present this panel. These are some of the experts who are thinking about this topic, and you know especially Citizen Lab has really led the way in the type of research that is looking at combining these sort of sociological and political aspects of human rights reporting with, you know, measurements and research and, you know, understanding the technological components of how networks work and how the Internet functions in our lives. So this is really exciting for me.
I am Meredith Whittaker. I am a programme manager at Google Research and my focus is measurements and open data. So that can be translated into trying to understand what is happening and trying to communicate it in a way that can be verified by others. So you're not taking my word for it. You're looking at what the facts are and you're able to contest my claims and make your own conclusions.
When we're talking about measuring Internet rights and openness, you can think of that in terms of the traditional human rights model. So reporting has always been a part of that gathering evidence, gathering accounts of human rights abuses, you know, gathering photographic evidence. And as we moved into a time in the world in which a lot of things began to happen online, in which a lot of abuses may have been facilitated by things that happened with networks, things that happened online, you know, the question really became how do we know? How do we gather the evidence we need to figure out what really happened to make the case for these types of abuses?
And in that context you can think of measuring, you can think of measurements as really evidence gathering, as a way of looking at these networks, looking at the bits on the wire and deducing from that what was happening and what was the impact on real humans in real times in the case of, you know, rights and openness.
So this is connecting Internet research with human rights reporting, and I have on the panel a number of people who are thinking about this from a number of different angles.
So I want to start with Tim Mauer, who is a policy analyst at the Open Technology Institute, and can make a clear connection between this type of research, which may seem very arcane to people and the real policy implications. So Tim?
>> TIM: Thank you, Meredith, and good morning everybody. My name is Tim, and I'm here to represent the Open Echnology Institute, which is part of this collaborative effort.
And I work -- I'm part of -- just in terms of background information, the OTI is a nonpartisan Public Policy think tank in Washington, D.C, in the United States, and I'm part of the team for the Open Technology Institute.
So the reason why I am on this panel is because I can speak to the importance of open data and measuring the network for the policy work. That is my day-to-day work.
I've noticed since I've been involved tin this space that the need for empirical data is important to inform the policy development. As some of you who might be -- have been in Washington might know, there is a lot of policy debate that is not necessarily grounded in empirical research, but this is an area where the more data you have, the better your policy recommendations.
To make that more specific, to give you two specific examples from the work that me and my colleagues have been engaged in in the last year or two is first broadband policy. So this kind of data can be used to verify whether consumers are actually getting what they pay for and to find out if that by measuring the network, whether the service that they pay companies is actually what they receive themselves. And that has direct implications for our policy work specifically when it comes to domestic policy in the United States, and it's been a critical tool.
I work specifically on export controls and looking into whether existing export control Regulations might need to be updated to the digital age. As you know, a lot of what we have seen coming out of the certain countries after the Arab Spring, a lot of the research that the Citizen Lab has been done has shed light on new technologies that are used for surveillance and censorship. And one of the things that efforts that people represented on this panel can help with is to help identify where those technologies are being used. To then inform the policy recommendations and analysis of what kind of changes might be necessary. But once those changes have taken place, how do you actually have a continuous research effort ongoing that helps policymakers to decide what kind of technology they should be looking at. Because as you know, you could -- the spectrum of technology is so wide. But if you have Regulations that are overly broad, you actually end up having a potentially negative effect in terms of what you're trying to achieve. So having that empirical data is very important to eliminate the negative, unanticipated results.
So that is why the Open Technology Institute has supported the measurement lab and other people on the panel. And the MLAT is a platform that allows totally open source measurement. It's open data. It's a global platform, and it's open source measurement methodology, which is why it also has -- why we are trying to put this out there as a tool and resource for other researchers that might not be directly linked to the collaborative platform but they can have access to data to come up with their own research. And that's where Collin's work is a prime example for the kind of effects that you can have by pursuing data that is then openly available.
>> MEREDITH WHITTAKER: Great.
>> TIM: There's currently 800 terabytes of totally open data available. Some regulators are using it. Researchers are using it. And it's increasingly a valuable resource for panelists like myself.
What is really magic is because the data is openly available, you have other people looking at it and connecting it to their own research. And Collin will talk about that in a bit, and I think that's really a critical example for anyone in the room who might be interested in using similar data to reach out to us later on to find out how we can connect our work.
>> MEREDITH WHITTAKER: Thank you so much. I think that's a great introduction. That really frames why is this important beyond simply publishing research papers, beyond doing research. And I'm now going to turn it over to Dominic Hamon, who is a research scientist I work with closely and he is familiar with the measurement lab data and network data generally, and I think you can sort of discuss best practices for measurements and frame this in a more technical scope before we move on.
>> DOMINICK HAMON: I'm one of the few technical resources here. What Tim did was a good job. He started out with it as a way for researchers in academia to do really good global research, collect network performance data, and then access other people's performance data. And what we found in doing that was a number of side benefits that we didn't expect, some of which you'll learn about later on the panel.
But I wanted to talk a bit more about some of the best practices for how we make sure people collect data, and why it is important that it is public and open. So when policy changes are made in a vacuum, it can be at best misguided, at worst dangerous. When you have policy changes that are informed by data, that can be really powerful. Now when you have policy changes that are informed by open data, public data, now you have something that is very powerful and measurable by others. And then when you have policy changes that are informed by public and open data, and the analysis is done using open source and the data collection is done using open source software, now you have something that is powerful, measurable and verifiable by others. And that's the best kind of policy change you want to make.
So there are three steps to responsible policy changes. The first is collect a lot of data before making a policy change. This establishes a baseline for where, you know, where the world is. Then make a policy change. And then collect data using the same methods and compare them to your baseline. And this will tell you whether or not your policy change did what you expected to it do. Or if there were any unexpected side effects.
Now, you can take that it methodology and imagine that you're collecting data in a sovereignty that is external to yours. You collect data for a long time, you establish a baseline. And then over time you notice a change in the data compared to the baseline. And from that, you can infer that there was a policy change in that sovereignty, even if it wasn't made public. That policy change might be one of censorship, one of throttling, it could be some other human rights violation, and that is the power of having the open data and open measurement and open analysis.
So a bit more about what I do specifically. I do a number of different projects in network research, but I've spent most of my time with the measurement lab. And a lot of my work is making sure that the data collected is internally consistent, which allows for these kind of longitudinal studies, right? It allows for the long-time baseline studies to happen. Making sure that the platform is broad and stable so that we have global coverage. Establishing mechanisms for easy access to this data. This is 800 terabytes of data. That is an awful lot of data. And it's not always obvious how to start analyzing it, how to start getting at it, how to find the needle in the haystack.
But one of the benefits is that we don't often know what some of the signals in there are, and we rely on other people having access to that data to find those signals and you'll learn about that from Colin specifically.
And the other is to establish good practices for responsible collection and publishing of this data. And what do I mean by that? Well, this is where some tension comes in. Because we want to make sure that this data is comprehensive. We want to make sure that this data is public, and we want to make it open. But when you start collecting data about people's network practices, or their connections from their smartphones, it's very tempting to start collecting data about things like their location. And you might want to assign every user a unique identifier, so that you can track individual devices through the network performance data. This can be very powerful. But it immediately starts infringing on those people's privacy. So we have a constant tension between wanting to get as much data and as good data as possible, but also making sure we're collecting it responsibly, such that we don't expose people to danger.
I think that's what I wanted to say for now.
>> MEREDITH WHITTAKER: Before I pass it to Collin, can you tell me how many iPods is 800 terabytes?
>> A lot. Five thousand.
>> MEREDITH WHITTAKER: That is a log of iPods.
We are going to move on, and I want to pass it now to Collin Anderson.
And we're sort of narrowing the scope now. We introduced the principles of open measurement. We talked about best practices and why this is so powerful in decision-making. And now Collin is going to talk about some research that he did, using this data that as Dominick emphasized was initially collected to show network performance. To show researchers who were interested in how to tune, how to understand the global network, to show them how to do that. This was technical data for technical people. And by collecting it, by creating that baseline it gave Collin the opportunity to do some really stunning research. So I'll pass it to you.
>> COLLIN ANDERSON: So I wanted to start off, there is another side that might be more framing than just the Article.
>> MEREDITH WHITTAKER: I have only this one.
>> COLLIN ANDERSON: So what is interesting is that States are increasingly sophisticated in the ways that they control networks. Whereas in 2009 you'd see articles and mechanisms for States. I generally focus on Iran, so I'll use Iran as a specific example. Because Iran has I think a strong tradition of being more sophisticated, more aggressive, and in some ways more creative in the ways that they dealt with the Internet.
So in 2009 the Iranian government struggled to deal with the popular uprising stemming out of allegations of electoral fraud. The immediate response of the Government was to shut down part of the communications networks. The mobile text messaging, for example, was down for 45 days within the first couple months. And what you saw across time was that when there are specific protest moments, networks were shut down. They would be more heavily censored. They would be subject to greater degrees of interference.
In the case of Iran, this became in some ways increasingly sophisticated, more narrow but more aggressive. So whereas you had that total network shut down in 2009, in 2010 you would see interference with SSL. You would see interference with the Torrent network. You would see DNS hijacking things. You would see DNS hijacked and leading to a phishing site. For gmail, using a compromised cert. This was increasingly sophisticated, in a sort of morbid way kind of impressive.
And so along this narrative, what you also saw was tied with political moment, this idea that the Internet was slow. And I think that this is interesting, because it's pernicious because it's a poor story, right? It's not tangible. It's just sort of, you know, we all have experiences where the Internet at our house is slow. But when you look at, you know, this mechanism of speed throttling, it's actually an incredibly aggressive, targeted move, right? Because this isn't simply Comcast is slowing down my bit torrent. This is my morning routine is I wake up and turn on my antifiltering tool. Maybe by the time the tea is done you get connected. Maybe as I make toast I may be able to log into Facebook, and by the time my breakfast is fully prepared maybe I can get my news feed up on Facebook. And so this became the story associated with preplanned protest moments.
So just to back up, we have seen in some degrees what seems like an accellerating notion of disconnecting the Internet with political events. But I would argue that what happened in Sudan last month, what has happened in Syria a couple months is going to be a decreasing phenomenon. What you get out of throttling is something that traditionally has been difficult to measure externally. It's a very boring story. And then on top of that, sort of achieves the same purpose, right?
It acknowledges, in this pragmatic way, the modern media landscape, which is that a state can't censor news of violence against protesters, against indigenous population, but today that violence doesn't exist unless there are media, pictures, videos, and similar things. And all of those things take much more bandwidth to convey than exists when there is a throttling regime.
And so what we're able to take, for example, and the beautiful thing about NDT is it has a measurement lab is largely because of the network diagnostic test, which is bundled in with a number of applications. A very large portion of this 5,000 iPods is NDT/data. So because everyone loves bit torrent, of which NDT is bundled with a popular bit torrent, you have the nodes of measurement across the world. In Iran there are something like 60 to 100 tests a day. And in places that you would -- that, for example, a researcher like myself could never go in, you find thousands of tests, even on a daily basis.
And so what we have now is we have this mechanism of accountability that was built in order for incredibly like relatively nuanced issues of Public Policy.
And so we can take this data, this measurement data, which had, you know, initially was -- seemed to be conceived of to -- for FCC and for European Union type purposes, and we can apply it. And so now we have a daily mechanism just to test and detect for throttling.
And so what we're able to do in this case is go back across -- the beautiful thing, this is more than 3 and a half years of data. So we have a very large portion of the sort of post 2009, post green movement period. We have a sort of demonstrable evidence that we can go back and say on this date, from this date, throttling occurred. It was you know, X percentage decrease. It lasted for this long. And then on top of that we can start taking a look at who the privileged people were.
And so to take for an example, and I'll conclude with this, so like I said, Iran has actually a very predictable censorship mechanism. Important dates, popular moments of contention, and included in this is elections. Especially an election, you know, this election that happened on June 14 was the first presidential election since the green movement.
So you know that something is going to happen. And so based off of that, we can sit with this data and we can measure as the Iranian Government institutes throttling the day of the official candidates being released, and not relenting on it until the day after the election results were announced and stability hit.
We can also take this for an advocacy point of view and then turn it into real public tangible results. So this graph, for example, is the electoral period. And what you see is an over 70 percent decrease in aggregate bandwidth. This doesn't begin to show some of the other forms of censorship that occurred, but it's very demonstrable over the control that happened.
And based off of this, based off of using peer reviewed methodological cited sources, we can then take this graph -- and this graph appears in an alternative form. In the UN Special Rapporteur for human rights in Iran's report that was released a couple days ago.
And there are something like three or four issues of Internet censorship that are included in that. And two are built off of open data sources that have an open methodologies that did exactly what we're talking about.
And on top of this -- last point, sorry. The Iranian Government reacts to every one of these reports very vehemently. And this is because they can create doubt in the methodology, in the data collection. In this case, what you can say is you can run the tests, here is the code, here is the calculations. You can run the test. How do you -- how can you disprove this data? It's difficult. And so this is what everything that these -- my previous two colleagues are speaking of, this is the power of that sort of sentiment.
>> MEREDITH WHITTAKER: Great Collin, thank you so much. I think that really brings it home and that graph is impressive.
So now I want to move it on to sort of -- discuss a different level of data. Another type of data collection. And I'm really happy to introduce Marco Hogenmorning from Ripe Atlas (inaudible). He can talk about the way in which the data that Ripe collected led to accidental revelations about connectivity that was happening in Pakistan and then talk about Wright and the Ripe Atlas measurement project generally.
I have a video here that may or may not play, and I'm just going to try to put that on as kind of a background tableau while Marco talks. But that may not work. So Marco please just take it away and I'll work with this.
>> MARCO: Thank you. As Meredith said, I work for the Ripe CC, the registry for Central Europe and parts of Central Asia. In our business we distribute and register IP addresses. Besides the core activity of doing that, we have quite a substantial research Department. And that came into existence out of a bit of curiosity. Not only distribute IP addresses, not only register them, but tried to see how they are used on the Internet. And that sort of turned into a quest of mapping the Internet at an infrastructure level that started 20 years ago with a project to try and count the number of computers connected through the Internet, the host count.
That idea now is impossible. Now we've got other things that look now at the Internet. And if you are phishing the Internet as a patch blanket of independent networks that all interconnect, what we're primarily looking into and measuring is how these connections are made. How the topology of the Internet is formed and how it changes over time.
So we're not really looking at performance data. And we're not really looking into censorship. That doesn't mean that we do not occasionally see things in the form of serendipity show up on our measurements that indicate censorship.
Now obvious examples that we see, for instance, are the recent disconnects in Egypt and Syria where we simply see the networks disappear from the Internet topology. They no longer show up in our data.
To give you a bit of background, we have about three months of data online just to help people troubleshoot network faults, because that's our primary goal is to help people trace topology faults and find out what is happening.
Right now we have just over ten years of this data online, so we can build really long trends. And sometimes after incidents occur, we can go back and try to recreate what happened.
So just to quickly introduce this video, a few years back the decision was made in Pakistan to block YouTube. That was done. And so sometimes people make mistakes. So the first thing was that this was never bound to show up in our data if everything would have worked out as planned, YouTube would have been cut off in Pakistan and nobody else would have noticed it.
Some misconfigurations both in Pakistan and upstream of Pakistan caused YouTube to disappear for the whole world. And that delivered interesting data if you later on go back and visualize what happened. And with that, we can then point out like where the mistakes or point out we can assume where mistakes were happening in the configuration.
So if Meredith will attempt at loading this video.
>> MEREDITH WHITTAKER: I don't think I can. But maybe you can -- there is a video here. And we can provide a link at the end so you can watch it. I think, you know, I trust Marco's narrative stylings to communicate sort of what happened. It was really interesting here.
>> MARCO: Yes, so the picture showed a snapshot of one of our visualization tools. It's kind of tiny, where every number represents a network on the Internet.
To the left -- to the right, I think, is YouTube here and to the left is Pakistan telecom. And if you would play the video, unfortunately, we can't, you sort of see that that announcement. You see, where Pakistan tried to pretend they were YouTube and that's quite a technical way of filtering something quick and easy. You pretend that you are those IP addresses and that means that all traffic redirects to you.
That was meant to stay in Pakistan, but unfortunately it got out to the world. And as it got out to the world, that message spread across the Internet. More and more networks decided that YouTube is in Pakistan. Let's go there.
So all traffic that was directed at Google that was directed at YouTube ended up in Pakistan Telecom, which, A, couldn't handle the load and obviously didn't have YouTube online, causing the rest of the world to panic and probably go easily bored because YouTube was done.
It's a shame that we can't play it, because it's a really nice visualization and you slowly see the world pick up on it. We see Google respond pretty fast in trying to mitigate the error and making several other messages to the Internet saying no, no, we're really YouTube. And we see some networks pick on that and some networks doesn't.
And the end of the story is that somewhere upstream of Pakistan, somebody gets a call and hey, you made a typo, corrects the error and you see pretty much Pakistan Telecom cut off and the rest of the world restored to the original plan, which was YouTube and Google.
Do you have my other slide?
>> MEREDITH WHITTAKER: I do.
>> MARCO: Okay. This is one thing, topology. Looking at that and the health of the Internet, we came up with a new plan. This is called Ripe Atlas. There are many atlases in the world. This is ours.
And we have got, and I've brought a few, these little tiny devices, which are tiny Linux computers that connect to a network, it can be in your home or in your data centre, and come in really small measurements. The equipment is not smart or no big enough to do content analysis. We can't do performance measurements. But we can do things like Bing, we can make a basic network connect to a Web server or send out a DNS request.
Our goal is to have one sitting in each of these networks that make up the Internet. There are 50,000 of them. Right now we have got up to close to 5,000 of these little devices online.
The map, every dot represents one of these devices. And you already see few red ones. Those are the ones that at the moment we took the snapshot they were not working. It could be a network failure at home or a power outage or whatever.
Now, an interesting bit in this network is that while we run some basic measurements to check on the K route service, and try to map topology changes on the Internet, we built this as an open platform. People participating in the project can run their own measurements, and it can be done on an individual basis. If you host one of these machines or one of these little thingies at home, you get X as you collect points and with those points you can run your own measurements. Partners, people who step in and say I sponsor also get access to measurement data. And they can run their own measurements and they can configure their own things.
So you can look into is this site blocked or not or what typed of DNS response do I get for a particular probe?
All the ma data is made public under the assumption that if you use our data, we require the result to be published and publicly available. That is pretty much our baseline. So people who are interested in hosting a probe, I've got two here and I think I've got a few more back at our booth. Please contact me. Other than that, for researchers in the room, have a look at this. Because this is -- we're not really running the measurements ourselves, we are just building the infrastructure. And Dominick and others can show what can be done with this information.
>> MEREDITH WHITTAKER: Wonderful. What I love about the Pakistan story is that it sort of makes such a clear connection between these technical decisions that .0001 percent of the world would understand or really care about. And their impact on people, right? To make a decision like blocking YouTube, you have to make one of these technical decisions and that can be detected. That shows up when you look at the network the way Ripe looks at the network.
Now I'll stay on the topic of Pakistan, but bring it to a sort of more local activist point of view. This is Shazad Amed, and he is with Bytes of Freedom, as the slide will say. And he has been doing measurements in Pakistan and using these measurements to facilitate human rights work on the ground. So I will let him drive the slides here, and explain this work.
>> SHAZAD AMED: I was beginning to think that the world has forgotten that YouTube fiasco. But it is still alive.
Okay. Quickly, Bytes For All, we are a organisation based in Pakistan. We have been working on Internet censorship issues since 2007. These are the few issues that we work on.
We have been part of Citizen Labs open network initiative since then, and we have been working on all these different issues. This is how we work. We conduct research through the document evidence or policy, advocacy and policy change, and capacity building of citizens.
So the next screen is very interesting, how it started. It really started with a panic alarm among Internet users in Pakistan when suddenly a lot of websites started disappearing in the country.
So -- and then we saw that there was a very visible increase in the filtering. Particularly when they blocked tumbler, that was the time when there was sort of panic. What is happening?
So this particular image that this website is not accessible and surf safely was the point that actually triggered this next group of research. One of the researchers at the Citizen Lab, he could sense that it's net sweeper. And then all the efforts were directed towards this, because he knew about net sweeper deployment in other countries, so that is how it was started.
So they blocked a few Wikipedia entries in Pakistan. And then proxies also started disappearing. So these are the -- that is in any case we were already running a campaign on a YouTube ban in Pakistan. So this was -- this is the access made Right campaign. And then we were also running a campaign around, the same campaign, access is my right, we were talking about censorship issues as well. And then how net freedom and Democracy in the country.
So with this background, when -- I mean, we started looking at it, at what is happening in Pakistan. Only then we started country level of activity. So, we developed a list of URLs which were of national interest, national interest being related to Pakistani process. These were news websites and websites on religion. And then there was a matter, a large list of International significance URLs as well.
So these, a list of -- two lists were then developed and we then field test on an ONI tool that actually runs on a different online connection, on a different ISP, and then helps network measurement.
So using this system, we can -- I mean, it would actually go to each URL and assess what the status is. And then eventually results are uploaded to Citizen Labs, the servers, where researchers can have further analysis of this.
So we came to know through this field testing that this IP, which you can see its based on the Pakistan Telecommunication limited network in Karachi. It was in Karachi. And then we could actually reach up to the Net sweeper admin panel over here.
So that is how -- looking at -- looking at the network analysis and measurement helped us to actually pinpoint what was happening and that was how we could assess.
So what we did with this report? Actually, when we were very sure, I mean that this has happened and that this is happening, the Citizen Lab developed a research brief that was launched in the media as well as in Pakistan, with huge coverage as well.
We are doing public interest litigation against the Federation of Pakistan on two issues. One is Internet Freedom, which actually includes YouTube banning in Pakistan. So YouTube is banned for about more than one year now in the country. So this is one case that we are fighting. And we have had 14 hearings on it already. Just to give you a good update on that, on the 19th of September the court has referred this case to a larger branch, because of several issues. I believe that it's not very pleasant to talk about it. But then let's hope that it somehow reaches a pleasant end at some point and the platforms are unblocked.
So this particular research, when we submitted it to the high court, where the case is being heard, the Government actually rejected this report, saying that this is fake and this is not acceptable. And then they said that this organisation does not have this capacity and how can they develop this, and they have made it up and they are maligning Pakistan. And they just ended up just throwing it out.
So then we said why not initiate a civilian case against the petitioner, because it's bringing a bad name to the country. And then it would unsettle anyone. But we had very authentic and proper research backing us, with the proper data. And luckily the Judge himself had seen this report on the website of Citizen Lab, so he dealt with him accordingly as he had to. So that was a little interesting thing.
This is the poster for the campaign.
So that is the quick story. I just wanted to say that these kinds of data research, analysis, are extremely important for effective policy adequacy. Because the situation is changing. A lot more, many countries are now heading towards controlling the Internet and filtering. And you know it's not only -- Pakistan is a democratic country and we are proud of it. But still we are doing it, there are repressive regimes as well, and they will do it more.
So it is extremely important. We know of these cases and we know how to go about it. And when you go with proper research and talk to and face these people, the policymakers or the judges in the court, it makes your case very strong and it helps in various ways.
Thank you very much.
>> MEREDITH WHITTAKER: Thank you so much, that was great.
That was a -- yes, that really brought it ho. And thank you for your work there.
Now we're going to come into the present and really talk about some of the work that Citizen Lab is doing. You know, has been doing for years and has been doing this week.
This is not theoretical. We're going to discuss some of the results from their measurements at IGF and in Bali, looking at network practices here, right now. What are the differences between the networks that we as IGF attendees can access and the networks that are available to people who live here on the ground.
And I'll let Masashi and Daru, who work with Citizens Lab and have been doing these measurements, talk about them. And I think just give an overview of some of Citizen Labs' work over time, because I think that really helps frame where we are today.
>> MASASHI: Thank you, Meredith. Just to get started, I wanted to zoom out for a second and talk about our general research area, which is trying to understand information controls. And we do this with our partner, such as Bytes For All, Daru, and other members that joined us at the IGF this year.
And just to give a definition of information controls, we consider them broadly as actions conducted in and through communications technologies that either deny, disrupt, manipulate or monitor information for a political end.
So what does that mean? Here is an example of information controls. And we will go through what the categories are. It's not comprehension, but just shows the diversity of the issues that we're looking at around the world.
So today we have been talking a lot about information denial. Internet filtering, throttling, other ways it can be done. There are service attacks, also nontechnical means, broad regulation, broad use of libel and slander laws in some countries. And the objective of all these things is to deny information from reaching to the user. And that can be done for a variety of different means and a variety of different rationales.
That's not the only kind of information control. There are also controls that seek to manipulate or project information. This can be done by compromising a website, changing the physical appearance to have content that might go against the message that a site has. If it's a site of an activist group, for example, all of the sudden the message that is there is against one of their campaigns. Maybe it can be altered and it can be changed. It can be done on a Government site. It can also be done through online propaganda, whether that's trying to project a message that (inaudible) or social media that is (inaudible.) So again it's not to deny the information, but to manipulate the discourse that is happening on our online sphere.
Another area that we are concerned with and do a lot of research on is information monitoring and surveillance. You can consider two kinds of monitoring. Some are passive. They are trying to collect as much information as possible, and the recent NSA revelations shined a light on how that might operate in one country. And others are more directed such as targeted malware attacks against individuals and they compromise their home their networks.
While there are multiple controls that are being exercised, there are also multiple actors that have to be considered. There are States, Civil Society, terrorist groups, cybercriminals and other groups out there, and also the private sector. And each one of the actors have a different place in political, legal, and technical systems. And each one of them are trying to assert different agendas and having different influences over these systems. So what is important is that to understand these different controls and these different actors, you really have to take a holistic approach and use mixed message techniques.
So what we are talking about today, using network measurements and other means of forensics. And what we do in the lab is we try to combine that with social science theories and methodologies and policy analyses.
So Indonesia is a really good example of why you require this extensive research approach in an effort to understand the situation here in terms of information controls.
So just to first start with the sense of the technical infrastructure, Indonesia is a little bit complicated. So there are over 300 Internet Service Providers here. So what the diagram shows, in the red, those are Indonesian autonomous systems and the different colors are foreign systems working under the entity of a particular entity or authority.
So the middle layer of nodes are Indonesian networks that have upstream connectivity to foreign networks, which are on the top. So the point is that this is very decentralized. In 2012, Renesis put out a blog noting that due to this decentralization, Indonesia is likely to be extremely resilient to Internet disconnection, which is interesting.
So using the general methodology that Shazad explained -- sorry I clicked the wrong computer. We have been doing network measurements in Indonesia for a long time. This shows the summary of our measurements from 2008 to 2010.
And the takeaway here, and there is a longer paper if you're interested in this methodology and also the data is available, is that just as the technical infrastructure is decentralized, so too are the techniques and the means used to filter content. In Indonesia there is a general focus on the filtration of pornography and gambling content. But there are other content filters as well and we will get into that in a minute.
So without going into the details of this graph, you can just see that across the different ISPs, there are different means of filtering. So just as the infrastructure is decentralized, so too are the controls.
So just to bring it down to where we are here at the IGF, we have been, as Meredith mentioned, running a project this week on trying to monitor information controls in and around the venue itself, looking at policy practices, and the debates that have been shaping various events, and taking this as an opportunity to explore and analyze wider issues around Internet censorship and surveillance in Indonesia.
And I'm very happy today to be joined by my colleauge Daru, who will explain how the networks that you all have been using the last week work.
>> DARU: So in this venue, there is a sign in between the host and IGF. And there is an open Internet connection available and provided.
And the primary -- sorry.
The primary wireless network identified as IGF 2013. And also IGF 2013-A. And two other networks is IGF 2013.ID and IGF 2013@Indonesia.
And these ISPs are the two largest providers in Indonesia.
>> MASASHI: So we ran some measurements during this week to try to understand how these networks work, and to verify whether the IGF network that Daru explained was free from filtering. And it was. However, the other two that depend on the two largest Internet service providers have the same controls as elsewhere in the country. So If you were on the one at the bottom here, and you went to a particular website, your content would resolve to one IP address and you'd be directed to the swap page, which was the Web Page for Trust Positive. They have a booth outside. And we will go into details by talking to Daru about what that means.
So we tested a sample of 1,387 URLs and found that 197 of those URLs were blocked through DNS tampering. A variety of content was blocked, pornography, LGBT and religious content, and other things.
We will get into the context in a minute. So this is the other network offered by Indosat. We tested the same sample URLs and found that 197 were filtered. Again a variety of content, independent media sites, religious content and certain Convention sites.
So this is just a breakdown of the different content that we found blocked on these networks. Also as a means of comparison, we ran measurements on another network, which we did through a 3G connection tethered through the phone. As you can see the focus, the top bar is pornography. And it goes through other categories related to social issues. And you can see a lot of overlap there in terms of focus on pornography between the sites.
Here you see -- sorry. Here you see political site, things dealing with political reform, women's rights, free speech, some overlap between these. These are content related to Internet tools, eContent, e-mail providers, and so on and so forth.
So just to give a sense of of the overlap between these, the highly decentralized environment of Indonesia means that there can be a variation of how content is filtered between ISPs. However, our results just with these three ISPs out of the 300 available in this country, do show some variation in filtering. There is a general overlap in terms of the content filter. There is definitely a focus on pornography. But we also see the blocking of non-pornographic LGBT content. And one other notable area of difference is we saw an automizer of the messaging tools, which we saw more heavily filtered on the (inaudible) network than the IGF.
So that gives you a sense of the network measurements that we have done, again in a limited sample of what we have been able to do this week. But just with technical measurements alone, that doesn't tell the whole story. You really have to understand the quality of the dimensions, the legal dimensions, and what the society has been doing in the country.
I will turn it over to Daru to talk about the government.
>> DARU: When we talk about this Internet filtering issue, we have to move back to 2008, with electronic information and limited freedom of expression and privacy information.
This shows there was hostility based on race and ethnicity and also religion. And if you see that on this content, we will see LGBT website and social networks that may be included on this regulation have been blocked.
And another regulation is antipornography law, which was first proposed in October 2008. And it's a position from the group and it shows cultural differences.
The law criminal discrimination and the use of pornography said that anyone distributing pornography could face up to 12 years in prison and like fined like 6 million Rupia. 100,000 US dollars. Antipornography laws are so aggressive as implemented by many ISPs, and that's the first state of the DNS filtering. And after that, become DNS Nawala, and now we have trust. So -- but because the centralized ISP and we have like two others and maybe more now, it's becoming fairly difficult. If you get blocked by one ISP and if you want a remedy, it's quite difficult.
>> MASASHI: So we'll be publishing reports on three issues looking at the infrastructure and governance environment of Indonesia, analyzing content controls, and also exploring surveillance by the end of today. So you can have more details on it. And I'd love to open the floor to discussion.
>> MEREDITH WHITTAKER: Thank you so much. I think that's a lot to take in. There are many technical terms mixed with some political narrative. Please ask questions to clarify anything. We will put up links to reach the data and watch the video. But I want to again echo Masashi and open the floor to questions after I thank the panel. This was really, really stellar. I'm really grateful for you guys to be here after all the hard work that you did that brought you here. And especially the Citizen Lab, who has really been leading the charge on this type of research, this database analysis of ground truth. So thank you.
Does anyone have questions?
I see Ali.
>> ALI: Hi. I'm Ali Banja. We did similar research on Iran and we realized that the information controlling Iran is very dynamic. Prior to the election they increased information control.
My question is to the group that did the -- looked into the information control during the IGF, did you notice a significant change, like from a month ago to this week, or did you see a change that implementation of the control by these ISPs?
>> MASASHI: So we were just running measurements during the week that we're here. We do have the longitudinal data going back from 2010 to compare. And the importance is just looking at the decentralized network, architecture and infrastructure, and the decentralized policy and practice of filtering generally in this country. And we just wanted this to be an example for people to take a deeper dive and have a greater awareness of the situation in Indonesia.
>> MEREDITH WHITTAKER: And since this room is a little odd and there is no walking mic, this mic right up here is open to people who aren't in front of the microphone, if you want to stand up over here in front of the microphone, it seems as good a solution as any.
>> Hi. I'm Ashafi from Malaysia. You mentioned that you know when you took the data to the court, they do not think that this is accessible as supporting evidence to claim the case. So how do you think that the law can narrow the gap? Because maybe the Government or the legislators, they don't really know about these technologies. How can you use this data to make -- to state your case? How do narrow the gap between the law and also the development in this area?
>> MEREDITH WHITTAKER: I want to actually, I think Shazad you can answer that, and then I want to direct that question to Dominick who thinks a lot about verifiable data and open data and how can you provide something that can be verified?
>> SHAZAD: It was not that there was any difficulty in court. It was actually the Government liar -- I don't know why I continue to say "liar." It's "lawyer." "Lawyer." So it was him who thought that this is not admissible and this is fake and this is made up.
So -- but the court didn't have any problems. So it's actually it's very important and useful that when you make a case and then when you -- so you have this kind of evidence with you. It only strengthens your point and then it happens in several cases.
>> DOMINICK HAMON: Yes, I'll just add to that. This is one of the reasons why having the data and the analysis be open and public is vital. Because if you have someone challenging the veracity of the data, if it's open, if other independent parties can do similar analysis, it can only strengthen your case and brings it tightly. It doesn't bridge the gap in the way that I think you're looking to, but at least it makes it a little easier to back yourself up with independent parties who can verify what you're saying.
>> MEREDITH WHITTAKER: So before there are any other questions, I just want to say. There are a lot of buzz words. What does "open" mean? Does that mean you can go to a website and read about it? What is "open" versus "closed" data?
>> DOMINICK HAMON: So open data means it is freely and -- I'm trying to -- it's like a game show. I'm trying to say it without saying the word "open." It's really hard.
>> MEREDITH WHITTAKER: Baaa.
>> DOMINICK HAMON: Right. No hesitation. Yes, "public" and "open" are kind of synonymous in this sense where the data is available to anyone to access, to get hold of. The -- would I go further and describe it as having the methodologies of collection are also publicly described and available to anyone to access. It means no barriers to entry. It means uncontrolled. It also, in our sense for MLAT, the measurement lab, we specifically try and -- well, we don't try. We specifically do not aggregate the data. Unaggregated, unfiltered, unprocessed, because in that way there is no risk of being accused of manipulation. Again, it strengthens cases when you can just show the path of data from collection to exposure to public.
>> MEREDITH WHITTAKER: So replicable science.
>> TIM: Just the question with regard to the lawyers. And I think what this panel would say, especially the approach that the Citizen Lab has taken, is that it's important to have the data and to have verifiable data. But you need to also have that intermediary step of educating people about the utility of that new data and how to interpret it. And the Citizen Lab has been doing it successfully and working with groups in those countries who know the context and to make that transition exercise.
But similar to when we saw fingerprinting becoming a tool in criminal law, that will take time and we can accelerate that by working and reaching out to lawyers and law schools and making them aware of these new tools and evidence that is available. But it will take time and, obviously, resources.
>> MEREDITH WHITTAKER: Shazad.
>> SAHZAD: On this note I would also like to mention that there is a lot of talk about surveillance in different countries, particularly if it's a special team on surveillance here or at IGF as well.
So if the censorship report that was published by Citizen Lab, we didn't have that research, we would not be able to get into any of the public interest litigation.
The Government of Pakistan, we have specifically 89 cases that we lost on surveillance and how it was used. So it was totally based on that one report that we could take it and then the Government admit -- sorry, the court admitted it. And the process is still not finished on that, and we have to have a second hearing. But that is another example of how research can be used by activists in the country effectively.
>> MEREDITH WHITTAKER: Great.
So we're hearing you mention net sweeper, you mentioned FinFisher. And I think this is a great way to sort of bring this back around to some of the work Tim is doing.
What is a FinFisher? What is a Netsweeper? And how do we connect this to maybe a more complicated but more realistic ecosystem?
>> TIM: So this is now in the work that also overlaps with the Citizen Labs. So please chime in here.
We started looking in a more systematic manner into tools that were used for censorship and surveillance and are now coming because of the new technology that is available. FinFisher is a good example for software that has been used to spy in countries like Bahrain, but also countries that are having sanctions at least in the U.S. context.
And in terms of the work that we are doing is using the work that the Citizen Lab and other groups have produced in terms of analyzing their technology and then tying that to our policy analysis by looking at what existing -- what is part of already the export control regime. And in the U.S. you have certain provisions with regard to path controls and they already allow for review of such technology, but where are the gaps? And that's where the research and analyzing and taking apart what we are talking about is critically important because it allows us to look at what kind of language in the existing policy already could be -- already matches the description of that technology. And to what degree are there limits. And Collin might want to add to that because he has been doing a lot of work on that.
And we see FinFisher was one example. We have now with this measurement, data that we talk about here, the ability to actually go deeper and find out where those technologies occur and in what countries. And then find out also what kind of lists of countries or what kind of indicators do we need to use to apply for, say, these particular countries. We want to review whether this technology should be used there or not. And I'm happy to talk more offline.
>> MASASHI: Just to add a bit of reference to some of the research that we have done on these tools, so particularly around FinFisher, which is marketed as a Governmental IT intrusion, that is the way that it was described by the company that develops and markets it. They claim that it's only sold to law enforcement agencies and other Governmental agencies for programmes such as lawful access. Quote unquote.
However, as time Tim mentioned, we found it directly targeting activists in Bahrain and Bahrainian activists who live in the United States. And through some measurement studies that our colleagues have done, have detected the presence of command and control servers. So servers that send commands to clients that are infected with FinSpy, which is part of the FinFisher suite, in over 36 countries, including here in Indonesia. And we will speak about that in the blog post that will be published later today.
Importantly, the presence of a command and control center in a country does not necessarily imply that the Government of that country has -- owns or operates FinFisher. However, it opens some very interesting questions. And Shazad mentioned we also found a presence in Pakistan also. And this is an important aspect of this research that has to be community driven.
So we are university-based researchers. We can create an evidence layer and help inform people about what is happening out there, but we really depend on our partners and others to take that evidence and run with it and try to ask these questions of their Government and otherwise of why are these devices being detected on these networks? What does it mean for broader implications for privacy of citizens in those countries?
I just quickly add that Netsweeper is a software used for Internet filtering, developed in Ontario, based in Toronto, Ontario. We found it of course in Pakistan but also being used to filter human rights content across countries in the Middle East. And we actually found an installation of Netsweeper on an ISP in Indonesia. That was interesting.
That just further goes to the point of how decentralized filtering practices and techniques are in Indonesia and why you have programs like Trust (inaudible) and there is a lot of variation between their ISPs.
>> MEREDITH WHITTAKER: I think we might have remote questions and we will take them. But I also wanted to leave this conservation with the question of where is FinFisher coming from? We talked about Pakistan, we talked about Iran, we talked about Indonesia. But these are using tools that, you know, I think Tim and others are well aware of may not come from these countries.
I -- okay.
And then remote questions?
>> REMOTE MODERATOR: So we have Karen Wu here from Malaysia. She asked two questions. The first is for Collin.
I understand that there is 4 terabytes of data that is collected. My question is: How do you ensure that the data is not compromised so that the users cannot be identified?
And the second is for Marco. I understand that you have computers that collect data from the Internet to understand more of the Internet. My question is whether the study is limited to public networks or does it include private networks?
>> COLLIN: So, this is what I was saying about the responsible collection of data. We spend a lot of time with the researchers that write their measurement tools that run the experiments, that provide the data that we manage at the measurement lab. We spend a lot of time with them to make sure that the data that they are collecting is anonymous. And this is one of the things I was mentioning earlier with the responsible collection ware. We have had researchers ask us if we would collect and process -- sorry. -- and store data which has included problematic data points, like geo location. Like unique IDs. Like the -- I can't even think of the list. But specifically with mobile data. It's pretty tricky.
And we have had people come to us with very, very long lists, and we have had to turn them down because we say we are not going to expose users to that level of risk. Once we have that data out there, we can't take it back and it's prime for mining, and we don't want to expose people.
We just -- the only way we can do it is to work very, very hard to not expose people. But that occasionally limits our ability and limits the things that we can collect, which is what I was talking about earlier, the tension.
>> So I think "tension" is an operative word because there is always a debate as to how much data you are going to collect and to what level of granularity.
Take, for example, NDT. NDT is what I use for IP addresses. In some cases -- I mean, in most cases, IP addresses can lead to the denominization of an individual. However -- so that graph was from a paper that I had written on this trend. I took great pains to emphasize including the politization of the NDT/data that the original mechanism was not a political -- was not a politically inspired data point. Which was to say that this was somebody who was not participating in censorship research. This was a normal functionality of a legal tool.
Now, the more that you get into data collection that is politicized, the more that this becomes a larger question than sometimes even the development of the tool. And so if you take, for example, if you look at the development history of learning probe, which is an upcoming and slowly developing censorship assessment framework, this is -- this has been at the core of a lot of the development.
And what you start to do is you start to have to sort of fuzz your data to be nonspecific to say that we're going to report the name of the network, but not necessarily the IP address. Or that we might round off the time.
And so there is a very large now I think set of conversations, of papers, even to a certain extent a research field on the ethics of data collection that I think are really interesting to go through. And I think on top of that, what I would strongly emphasize is that this is a debate in which there is a strong need for challenge at every point that anyone makes a decision on what level of data to collect.
And I would invite everyone in the room to participate in that debate. Because a small amount of voices leads to group think, which leads to bad decisions and missing things that might potentially be costly.
>> Thank you.
The question was brought up