One World, Diverse Content, and Flexible Access
02 September 2014 - A Workshop on in Istanbul,Turkey
The following is the output of the real-time captioning taken during the IGF 2014 Istanbul, Turkey, meetings. Although it is largely accurate, in some cases it may be incomplete or inaccurate due to inaudible passages or transcription errors. It is posted as an aid to understanding the proceedings at the session, but should not be treated as an authoritative record.
>> DR. NOHA ADLY: Okay. Good morning, everyone. I would like to welcome you to the workshop "One World, Diverse Content and Flexible Access." The workshop today is aiming at two themes, it's to how to maintain the cultural and linguistic diversity and the governmental policies required to enhance the creation and dissemination of local content.
I would like to thank the organisers. First of all, I would like to thank the Bibliotheca Alexandrina and the Ministry of Communications Information Technology in Egypt, which I'm representing, and Dr. Ismail Serageldin, the Director of the Library of Alexandria. And I also I would like to thank the co-organiser, the UNESCO and the UN-ESCWA. And I would like to greet and thank all of our speakers who have joined us today to represent the prominent entities that they are from all over the world.
Specifically I would like to thank Dr. John Van Oudenaren from the World Digital Library of the Library of Congress; Dr. Indrajit Banerjee from the UNESCO; Dr. Elycia Wallis from the Museum Victorian in Australia, who will participate remotely. I hope we solve the connection issue with Dr. Wallis so she can join us. Dr. Lorrayne Porciuncula from the OECD; Dr. Haidar Fraihat from UN-ESCWA; and Mr. Makane Faye from UN ECA.
The objective of the workshop was basically to shed light on how the cultural diversity and the linguistic diversity can be maintained within the information and Knowledge Society using Information Communication Technology. Also into seeing how to explore the different approaches in creating and disseminating local content in order to achieve the global collaboration horizons that extend beyond national and linguistics boundaries. And to pay attention to the local content creation and making content that's endangered more accessible and find means to protect them. And finally to propose recommendations to encounter challenges of creating and improving the enabling environment for the development of local content.
So I would like to start with our first speaker, Dr. Ismail Serageldin, who I do not think I need to introduce him. I think he's very well known. Dr. Serageldin is the Director of the Library of Alexandria and chairing the research centre of the library. He's chair of many international panels and committees all over the world. And I know that he has a very special passion for Africa.
He has been the Vice President of the World Bank. And he has held the chair of the professorship in College de France among many other international positions that he has had.
So Dr. Serageldin, I would like to ask you, given I know very well the BA has lots of efforts in digitization of content on the cultural heritage promoting access to archives and to public access to content but also it has lots of contribution in supporting local content creation, specifically supporting multilingualism. And it has worked on creating an enabling environment that supports the contribution of community into the local content and encouraging the community to participate in there. And also you have extended the efforts not only within Egypt but to extend to the region and other parts of the world.
So I would like you to, if you can, share with us the efforts that the BA has been doing in this aspect. Thank you.
>> ISMAIL SERAGELDIN: Thank you very much. Yes, I would very much like to cover a quick view about what we're doing in building intercultural bridges at the Library of Alexandria, provided that, of course, this would work.
>> DR. NOHA ADLY: What happened to the --
>> ISMAIL SERAGELDIN: Somebody changed the batteries -- took it to change the batteries and it doesn't work.
>> ISMAIL SERAGELDIN: Well, anyway, I wanted to give you two examples of what we do.
The library has very advanced informatics thanks to Dr. Mehlab before going to run the Government of Egypt's I.T. concerns was helping build up the Library of Alexandria's I.T. Dr. Magdi Nagi, who is there is also a very central person. We have very many such programmes but one that I wanted to address what is called the UNL, Universal Networking Language which started with the University of Tokyo and where the idea was initially to have machine translation. And with the effort that was built into that was a hub and spoke design so that instead of trying to build programmes that would translate from one language into the other, we would map the language into a computer language. Largely you a logical structure of grammar, syntactic structure called UNL universal language then it would go back to English then to UNL from UNL to Arabic.
Why would that be important? It would allow each small language to have their linguists link directly from Armenian UNL into Arabic. And we at the Library of Alexandria, suddenly we have a translation possibility between Armenian and Arabic, which would have been impossible to do otherwise. So that idea was started and we succeeded because beyond that we built in the Arabic part we also built a huge database. Called the international corpus of Arabic where we have 100 million words. And each word has 16 lexical qualifiers, which therefore, allows us to build a table that has approximately 100 million lines and 16 columns.
And from that table we can disambiguate individual words so as to know how they should be used there.
It's finally come on. Let me see. I'll skip this very quickly.
So these are the two issues I'm giving you examples of, the issue of language and empowering science working with Africa so issue on language UNL the basic idea was not to have individual programmes to think such languages rather to have a hub and while this looks more complicated in fact if you see when we add the French or German they have opened up on the French and Arabic and so on now it's becoming a reality this is what UNL looks like it's not a spoken language but purely a computer language and we are responsible for this part and we have used it also to do additional work this is a team that did this work. And you can see at the centre, you can ask the doctors whatever you want about this we have 100 million words presenting more standard Arabic that are syntactically analyzed these are the sources we have taken the academic research and the genre arts and culture biography religion sports et cetera and so far we have 120 million words collected we have mapped them in to match differences between countries and difference between genres and then 2 million words were analyzed what I mean here is what it looks like.
These are 16 lexical qualifiers. They have a table that looks like that 100 million lines 16 columns filled out and each one has metadata for information to be retrieved and verbs in the present tense you can correct them and then we took a test bed with 2 million words that were corrected by experts having done this by hand we know where the errors are we were able to get a level of precision we run the programme on 2 million words see what we missed correct it, tweak it, get better and better so very soon we will reach an accuracy level of 95 to 98 since I had done that slide we are now at 91% so it's advancing quite well. Now the achievements of disambiguate meaning you can build -- I'm sorry; this thing is not working.
The disambiguation is particularly important because it enables you to tell the difference in Arabic words that are complicated. A word like alamin (phonetic), you can't tell if it's alam, al, because there's nothing critical on it so we had to design a programme of contextual interpretation that allows us to do this kind of work.
This has been quite successful. And I dare say -- bring the next one, the next one, come on, move. It's not working it's not working.
And the key thing is that Google had tried to do Tashkeel. We succeeded in doing that because the construction of the Arabic language is such that you're not relying only on the position of the word to understand the sentence, you are looking for the last vocalization on the word.
And if most of the text that's available, it doesn't help for us to be able to tell what the word is. We need to understand the sentence we have developed programmes that do this and do this at a great level of success.
So that is one of the programmes that we have achieved so far. And the other programme I wanted to show you was really relating to our work in Africa there we have two problems of course one has been access to science and with As-Safir we have been able to get 150 free subscriptions to scopists to material to different research through Sub-Saharan Africa but then we found out although the Developing World is 80% of the population, 20% of the scientists they have only 3% of the scientific publications and we talked to a number of editors they said the key problem is the quality of the research design and analytical work coming from Developing Countries so we created the RNLA. This has 5,000 items in it, lectures, material, research, and is supported by about 300 top experts in statistics to help work with the local researchers. We have even created the button that they can put on their Web site press the button and you are inside the research methods Library of Alexandria. And so far we have been able to reach a large number, about 37 such buttons in Africa. We have over 16 buttons and 9 accuracy and 30 individuals so our network is beginning and expanding. One is to bring access to the information. One is to empower people to produce their own information.
And matching that we also have of course the work on translation. And disambiguation of Arabic for all of the younger generation of Arab students so these are some of the examples I could go on and on there's a lot of these examples but I think I will stop here to make sure I leave time for my colleagues.
So anyway, I think the presentation would have been better with the slides. But in the absence of the slides, I think the general ideas are still correct. So thank you very much.
>> ISMAIL SERAGELDIN: Never mind I think we'll just give the floor to others.
>> DR. NOHA ADLY: Thank you very much for your intervention and I'm very sorry for the technical problems that they are facing with the presentation. I'm sure the visual aids would have conveyed more. But your intervention has let us know what the library is doing on these two measured initiatives. Taking from that actually I would like to call up on Dr. John Van Oudenaren. Who was directing the World Digital Library in the Library of Congress. And I would like to ask Dr. John the work of the Digital Library is a known initiative that promotes linguistic diversity and multilingualism specifically on content creation. So what are the challenges that this initiative has been facing into creating and a maintaining a multilingual Web site such as the WDL and do you find the investment that's been done in multilingualism has been paid so far and also the interface of the WDL is in major seven languages. And they are all spoken by hundreds of millions of people. But how about content in languages that they are spoken by smaller number of people. Does the WDL have anything to offer on this front? Thank you.
>> DR. JOHN VAN OUDENAREN: Thank you to the organisers for inviting me here for this opportunity. Okay. Since time is short, let me get started. Okay. This is the mission and objectives of the World Digital Library. Our mission is basically to put cultural content online for free make it available to as many people as possible. With a number of objectives regarding promoting intercultural understanding, linguistic diversity, capacity building and so forth.
Very quickly I won't go over the whole history of the timeline but the idea goes back to 2005 when Dr. Billington in the Library of Congress proposed the idea of a World Digital Library UNESCO we then worked with UNESCO to develop a prototype a governance structure. You see there in March 2010 we adopted the WDL charter I should mention that under the charter we have an Executive Council and Dr. Adly is the Chairman of the Council and the last bullet on the slide we have actually launched a new beta site for people to look at which will eventually replace the old site with the same functionality but I won't go over the whole history.
Very quickly in those early days we had a planning process involving a number of UNESCO expert groups and also very heavily internally in the Library of Congress. And we really asked very hard the question, you know, what would make a WDL worth doing because we did have this burden of skepticism. Well, yeah, Dr. Billington is a famous librarian very important person in the library world he made this proposal but what's going to be the value add of the WDL why should people join and participate. We came up with three things, multilingualism, universality, high level of functionality and added value. We have been working to implement all three of these things ever since. I'll focus today on multilingualism because that's the key to our topic here.
As Noha said the site is available in seven languages meaning the interface is available in seven languages the six official UN languages plus Portuguese and the content is in 114 languages so far. So this gets to Noha's question the interface is in major big languages although we do have some gaps there with particularly south Asian languages and most of the content is in heavily spoken languages. Arabic, Chinese, Russian, English, German, some dead languages, Latin and Greek and so on because of their cultural importance. But we do try to focus on lesser known languages. Indian languages from South Africa, African languages and so on here you see the top languages in the world by number of speakers and by number of Internet users this is always changing. This is 2012 data. It's probably out of date by now.
But you can see there's a pretty good correspondence because spoken and Internet. And this is the actual usage of the World Digital Library. So this is -- this surprised us. We didn't predict. 46% are Spanish followed by English Portuguese and then Arabic has actually moved up to fourth there's a big gap between the big three, so to speak, and the other four. But we have seen tremendous in Arabic in particular and Chinese is growing. But we have a lot of work to do to get some of these more difficult at least from our point of view they are difficult languages better exposed.
But it's growing. So Arabic has actually grown quite a bit.
Here you see the country -- we have heavy usage in Spain, U.S., Brazil, Mexico, China. But see, it's pretty well distributed across all of these. You can see why the Spanish speaking world is so high because we have a number of countries. We have Spain, Mexico, Colombia, Peru and so on with Chile very heavy usage but it's becoming fairly widespread where we have a couple of Arabic speaking countries, China of course, Brazil is very high. So we're getting pretty good coverage usage from around the world.
A little bit about universality we made a point of having some content about all UN Member Countries.
We now have 183 partners in 81 countries and our goal is to get at least one partner in every country. Now by partner means an institution, a library, or a museum or whatever that could contribute content.
Here you see these tend to be who the partners are. And pretty good distribution, very strong coverage in western Europe, in North America but also East Asia and so on. Now, capacity building, this gets to the issue of local content creation that Dr. Adly raised.
One of the biggest obstacles to creating local content for showing on the World Digital Library and by you that means digitizing local material, writing descriptions about it, creating metadata about it and so on is capacity.
And we have raised money from foundations to set up digital conversion centres in three places at the national library in archives in Egypt in Cairo, Baghdad, Iraq and Uganda and here you see some photos from the activities underway at those countries. We would like to do a lot more of this. When I say we it doesn't all necessarily need to be done through Library of Congress it could be done through UNESCO or minute trees of development or whatever but we see a need for a lot more capacity building in the Developing World of course there are competing needs, health and disasters and so on.
Functionality and added value I won't spend time on this because I want you to just actually go to the site. High quality metadata, interpretation, browse, search, zoom, full text search, exposure to search engines, text-to-speech conversion this is actually relevant to multilingualism. You can read. You can listen. This is helpful for people who have learning disabilities. You can listen to all of the metadata on the site.
Translation model. I'll finish.
We translate everything. The aim is to have an equivalent user experience in each of the languages. So it's not like Wikipedia where every version of Wikipedia is a little bit different and there's no uniformity.
We want to have a uniform experience here so that what you see on the English site is the same as you see on the Arabic site and so on.
This top one is very important. User access is key to the language of the browser. So somebody goes on World Digital Library in France as I did two weeks ago in Leone it comes up as a French Web site and in Brazil it's a Portuguese Web site if you don't happen to be in a country that's one of the interface languages it's not clear what it comes up with usually English I suppose but that's important because many people that use this library don't have an idea that it's a multilingual Web site they think it's a Portuguese or Spanish Web site which is good but puts a heavy burden on us because the quality has to be good we don't get credit for being an English Web site that also adds features in foreign languages people will want it to be good and if it's not, they complain. I'm running out of time here so let me quickly say we would like to add additional interface languages going back to those early slides, the biggest gap we have is south Asian languages there's a pretty good correspondence between our 7 and the number of speakers in the world but not when it comes to south Asia there's languages Hindi and hundreds of millions of people speak which aren't on the World Digital Library we would like to add those at some point but right now we have enough trouble on our hands just handling the seven that we have.
Translation method I won't go into this except to say there's a very sharp contrast here between what we're doing and the UNL system that Dr. Serageldin talked about this is a much smaller effort we have done 10,555 items so far so we use machine assisted translation not machine translation we use a tradeoff system with very large translation memories.
And we use an authoritative English record as sort of the -- everything goes into English and then into the other languages. We don't translate Arabic to Chinese or whatever. And we use tradeoffs and so on. Just to finish, content. Please go on the site. But here is just -- I'll finish with here is a Mayan Codex from 1200 from a library in Germany. Here is a Damascus Bible originally made in Spain from the National Library of Israel. Here is a Bible, one of the most expensive books made from extensive Modai (phonetic). Here we have somebody from Burberry in the States. This is from the national archives in Iran which is one of our partners. Here is an atlas from the national library of Spain an old map of Mexico City. A Chinese map. This is from Yale University library scientific work in Persian. An astronomical work in Arabic. Another work from Iran a famous book of kings, a lot of nature stuff.
We don't yet have a Turkish partner if there are any Turks out there we would love to get Turkey in this but we have other items -- an atlas and other things from the Turkish library in Baltimore so that's basically it moving forward we're adding content. We're trying to get more partners. We're trying to engage a wider audience. And we're adding interpretive and thematic materials on this new beta Web site which I would encourage you to take a look at so that's the overall picture. And I look forward to the discussion.
>> DR. NOHA ADLY: Thank you very much, John, for this quick tour of the of World Diane library and with the emphasis on the different diversity content specifically with the multilingualism and the different languages -- the 114 languages of content which is extending your reach. And from that I would like to call up on Dr. Indrajit Banerjee, who is an expert in the social impact of Information Communication Technology. Who has just taken office as Director of the UNESCO's knowledge societies division.
And UNESCO plays a vital role in maintaining diversity in cyberspace. So we would like to hear your opinion on what are the major challenges in multilingualism in cyberspace what measures can be used to encourage multilingualism content creation and dissemination and use in the cyberspace.
>> INDRAJIT BANERJEE: Thank you. And I you thank the organisers for giving me the opportunity of being present here. Of course I'm no librarian of the caliber of Dr. Serageldin and Dr. John here so my presentation will be rather different. Basically my objective will be to try to see what UNESCO has been doing to promote multilingualism especially in cyberspace. We have faced numerous challenges but I think we have made significant process in the past few years as you shall see.
Now this is just a recap how many significant documents have been ratified how many resolutions have been passed by UNESCO to highlight the importance we give to the whole question of multilingualism and multiculturalism and at various levels whether it be tangible heritage, world heritage sites of course. And increasingly so in the domain of languages.
And you see the one marked in orange there was the one which is the latest recommendation on the use and promotion of use in cyberspace.
Now what you notice if we go back and map the technological developments as well as societal political economic environmental changes, there's numerous dependencies, situations, structures practices which have been created. In this context and as all of you will know in the globalization area the whole notion of multilingualism and multiculturalism must be visited we can't look at it the same way we did 50 years ago increasingly people are multilingual. There are a lot of people who belong to several cultures at the same time. And this of course transforms the situation in which we deal with whole question of multilingualism. Now we see that very clearly in the past what has happened is that there has been a tendency to build a technology centre. And I think in Dr. Serageldin's presentation it was interesting to see how much attention they had paid to the users in preparing all of this and preparing this access to all of this data.
And we believe that the challenges to protecting rights on the Internet must be increasingly people centered as you see as I'll go along numerous initiatives have been taken by UNESCO to ensure there's a greater people centred approach to multilingualism.
Now we are moving towards the end of the Millennium Development Goals. Many people have of course commented in the MDGs no place had been made for culture although we notice that almost any one of those goals are deeply linked to cultural issues, cultural context cultural environments cultural practices. We believe that as far as the debates currently are going on we see in post 2015 sustainable development goal agenda we hope that linguistic and cultural diversity can be mainstreamed across all of the new development goals. UNESCO has taken a very interesting approach since 2005 at the World Summit on the Information Society. What we have done is decided to mainstream cultural linguistic diversity in terms of our overall approach as some of you would know in 2005 at the World Summit they came up with knowledge societies UNESCO came up arguing that the Information Society was not the appropriate concept because it was just the means and the tool and the ultimate goal was knowledge and that effectively when you look at development today most development in any field is driven by the knowledge dividend. So therefore, you can see that we have this concept which has since evolved since 2005 of course.
We have four pillars, education for all. Cultural and linguistic diversity access to information and knowledge and freedom of expression.
This itself obliges UNESCO obliges all of our partner agencies to pay great attention to the show of cultural diversity because we consider it as one of the key pillars of the knowledge societies that we wish to build.
In our work itself in our approach to respond to your question we have actually four clear approaches and strengths one is the use of normative instruments and this is fundamental because we have seen in the past that UNESCO on many occasions we can beg and plead with Member States to do something but as long as there's no binding or normative instrument nothing much gets done because it's left to Member States to do what they want so we have normative instruments we have very significant research outputs in the cultural and linguistic diversity we have partnership with several agencies and some special initiatives.
I will skip through the normative instrument but it's extremely important to note that our recommendation on the use and promotion of multilingualism in cyberspace is now a binding instrument all Member States have to submit to us every two years reports what they have done to promote multilingualism in their respective countries and so on. This recommendation is the only normative instrument of its kind at UNESCO. And we invite but also to a certain extent oblige our Member States to take concrete measures for the promotion of multilingualism.
We have of course over the years asked our Member Countries to submit reports on what has been done, what are the languages which are in danger. And I'll take this opportunity to point you to our world atlas of endangered languages which is a very unique initiative by UNESCO which actually shows you which are the languages which are in danger around the world which are likely that are going to disappear, et cetera and this overall map gives you a sense of how severely languages are threatened in not only in cyberspace but in reality. In terms of research we did a very interesting research with OECD and ISOC. The idea was to convince governments that having local content is not only beneficial for itself and for promotion of multilingualism but the more you have local content the lesser the cost of access becomes.
And you can see that for example countries like India which is my country, the reason there's been such an explosion of media is simply because of the number of large language scopes we have so you have 4, 5, 600 television channels many in local languages because they are fairly large and very vibrant language groups.
In terms of international corporation we have this partnership we are look at top level domains, ccTLDs to encourage more and more languages to have their own Internet domain names and that we see by itself may not be promoting local content but I think it's an initiative an incentive for people to have their own domain names and populate it with local content. This is the atlas I mentioned to you. We are now working to completely revamp, update this atlas and this should be giving you and all Member States a fairly good sense of how severe the threat to languages are.
I believe today we have 2,000 languages -- out of the 6000 officially recognized languages in the world only 2,000 are online and so we're trying to do also more at UNESCO level with our Member States to ensure more and more languages are present online.
Just wanted to point out the fact that there's International Mother Language Day we celebrate we strongly recommend all of you celebrate that because I think the mother language the mother tongue has always been an extremely important foundation in which to build to promote multilingualism and linguistic and cultural diversity.
This is our Web site please visit it you can see all of the initiatives we have undertaken in order to promote cultural and linguistic diversities online. Thank you for your attention. We have several initiatives.
>> DR. NOHA ADLY: Thank you, Indrajit for this enriching intervention showing us what is UNESCO's effort in this area. It's obvious that you are tackling different and -- several approaches combining community research with international corporation with special initiatives among other things. And this is a very enriching way of achieving it.
So now I would like to -- we have our next speaker is Dr. Elycia Wallis from Australia. But I don't know whether -- do we have the connection? We don't have the connection. So we'll skip the presentation.
Unfortunately she is supposed to be connecting -- doing it remotely, doing the connection remotely. However, there is a problem with the technical equipment here and we're not able to establish the connection.
So I will switch now to Dr. Lorrayne Porciuncula from the OECD. Lorrayne is an analyst on the broadband policy at the digital economy and policy division in the OECD.
And what I would like to ask Lorrayne actually is the OECD is very well known on the expertise on the Public Sector information. And the Public Sector information is yet another type of content which generate benefit across the economy and what is it -- now it's called the digital economy.
So given this great expertise of the OECD, we would like that you share with us what has been the recent development in the Government PSI related policies and initiatives and now with the growth of the Big Data how does Big Data affect the PSI the Public Sector information creation and use.
>> LORRAYNE PORCIUNCULA: Thank you very much, good morning to the audience and for those watching us remotely. I like to start by talking about the framework of the Knowledge Economy from the OECD. And then on local content. And then we go to PSI and Public Sector information.
So as economies move towards being more knowledge intense and the information rich activities spread around different sectors, they become crucial raw materials to the production chain of value added goods and services. So the aggregate benefits from making data and information more easily available has been increasingly recognized and backed by evidence.
So this evidence points to the relationship between reducing barriers and innovation.
And the amazing role that the Internet takes in this regard is providing extremely effective and means for disseminating knowledge and especially new knowledge.
So the Internet represents both the possibility of expanding opportunities for individuals and firms to access relevant knowledge produced by others independently of their own investment in new knowledge and on the opportunity to collaborate with others and share their own bottom-up innovations with the world so this is what we call inclusive innovation.
What's happening here? So this is what we call inclusive innovation which is so important for bottom-up economies for developing economies. So we have empiric evidence of the positive impact of industries production on firm's productivity and also on the digitalization of industries providing larger benefits for exactly those firms that commonly face greater obstacles for engaging in innovation. Such as small firms, SMEs.
So we have several works on local content, one of them was mentioned by Dr. Banerjee before. We have policy guidance and digital content which remains extremely up to date even if it's from 2008. But because knowledge is so important in the economy and society, any barriers in access due to language, cultural, economic, literacy barriers are to be overcome because they hinder economic aims so having those barriers are too high and the benefits in the organisations -- that the organisations would have from the use and reuse of information are to be taken into account on the cost of keeping those barriers on.
So in this OECD work with UNESCO and ISOC, I should highlight two of the main findings.
So one is that local content is growing at astonishing rates. And the other one is that infrastructure matters, too.
So content not only drives infrastructure investments but also promotes -- infrastructure promotes local distribution of content more efficiently. And delivers content out to the world.
So not only locally it helps with distribution but on making local content global.
And then when you think about the role the market players have on creating and developing digital content business models, we need to think about the role the Government has on creating enabling factors for the creation and use of local content.
So they should take measures to support cultural diversity and local partnerships for example and bringing enhanced capabilities removing barriers and other imperatives for local content distribution. They should also promote competition to lower access prices.
And improve connectivity and network infrastructure.
So one example for example is the case of Nepal. That they observed that after the first year of implementation of their Internet exchange point, they saw that the peak of traffic in that first year was exactly when the grades from local public schools were released.
So that supports the evidence that putting on infrastructure drives local content. And that this kind of content is exactly one of the most relevant and useful for individuals. So by it's nature the content produced by the Public Sector is local and relevant.
So that's why it's so important that governments become role models in the creation and distribution of local content. And the potential of Public Sector unleashing is quite high.
We have OECD recommendation on PSI from 2008 that now has reviewed and it will be soon published. But we have actually estimated the size of the PSI market in the OECD economies. And although I cannot release the data, it's measured in billions. Not only that, but the economic impact of unleashing PSI is also measured in many billions.
Not only all of this economic direct impact when you think about increased transparency, trust, efficiency and quality of public services and on improving linkages between Government and citizens, unleashing PSI and opening data is an extremely smart strategy for any Government.
It reduces the costs for both citizens and businesses. So for example it was estimated how much time a citizen would gain on better public services.
So in Norway, you would benefit from two hours a year per citizen. And that measured economically also is counted in billions. And the production and release of Public Sector information becomes a direct input for new business models.
So not only business models, services and products enhance helping economic gains and overall social gains.
So there are good practices in the world. One of the other known one is gov.uk. And we have other examples not only on making public information and services systematically available and easy to understand by citizens but also platforms such as challenge.gov that brings people to crowdsource solutions to the Government so that it's part of the co-production of solutions to different challenges. You have many different examples on making data easily available and producing a platform where people can make up applications for this data, they can produce new products even for market or for the Public Sector themselves. Have the case of Chile for example which is also systemization of Public Sector services and information.
And a very interesting case that I find is the example of the German Government. So that's called Lichen spresion (phonetic). It's easy German. It's very interesting what they did. And many of the Government Web sites already have tools to increase and reduce the words. What they did is put an option for people to choose for easy German that is specifically relevant for people who have low literacy rates and for immigrant populations who are the ones who would benefit most from access to Public Sector information and services. So some strategic perspectives for local content.
So the idea is there are some costly normative actions that can be taken and unleash existing opportunities. So seizing low hanging fruits.
You should also target high yield and systemic actions such as eGovern platforms, eHealth, production of maps using data from population, traffic, weather.
And also taking into account hidden issues. Because issues like security and privacy can hinder value added when you're working on a PSI strategy also we have problems with finding internationally comparable measurements. But we can continue that in the discussion. I'm up on my time. Thank you very much.
>> DR. NOHA ADLY: Thank you, Lorrayne and thank you for keeping the time. And I'm sure that in the discussion there will be questions coming up regarding the different items and ideas that you have presented.
Now we would like to turn to Dr. Haidar Fraihat who is Director of technology of development division at the UN ESCWA in Beirut before that he had two Government posts in Jordan including in the Information Technology Centre. So the Government CIO.
Now for the ESCWA, Dr. Fraihat is one of the most prominent entities in the Arab region and been promoting an addition of Arabic content so we would like to hear from you in your opinion what is the most promising sector in the Arabic content industry and what are the most notable challenges that are preventing the Arab region from becoming the leader in the development of digital Arabic content and what are efforts ESCWA is doing in order to overcome these challenges and promote Arabic content. Thank you.
>> DR. HAIDAR FRAIHAT: Thank you very much. I will try to answer your very legitimate questions through this presentation.
I'll speak something about the digital content background in the Arab world. I will speak about the Arabic digital content and some recommendations.
The importance of digital Arabic content, first of all, let me just share this information with you. There are 22 countries of the world. We call the Arab countries members of the League of Arab States. All of them the official language is Arabic so we have 22 countries of the world country community whose official language is one, same language. We have many other countries of the world where Arabic is a second language either by constitution or by practice.
There are many issues when we speak about -- my previous colleagues speak about spectrum of languages in general so please allow me to zoom in on the Arabic language for two purposes, one language as a case, and No. 2, to give the audience some idea about what's going on in this region.
Preserving the Arabic language and identity online is one thing. Enriching content for the Internet in general on various devices.
One research indicated that 90% of the Arab speakers whether in the Arab countries or elsewhere, they prefer to read the stuff on Internet using their native language, the mother language, the Arabic language.
Youth unemployment, one other issue is we have looked at promoting digital Arabic content on the Internet as one way of fighting and mitigating unemployment. Because youth, they are enticed to work on content creation, content consumption and so on.
So we think promoting digital Arabic content is one way of dealing with this notorious problem which is unemployment among youth, especially female.
These are some numbers. The digital content industry globally has grown from 731 billion dollars to $885 billion globally. In the Arab region it's grown from 21 billion to 26 billion between 2011 and now, which is three, four years let's say.
Now, the digital Arabic content in the Arab region has grown from 3.5 billion to 5.3 billion.
So we are talking about a region with a 350 million population. We have these numbers. As you see, there's potential growth at the all languages all languages in the region and Arab language in the Arab region.
I don't want to go through definitions because the definition of digital Arabic language is similar to the definition of let's say digital English language or digital German language or digital Chinese language or any other definition.
Now, the mandate, why we are all here. Because there are some global mandates. There are some UN mandates. As some of my previous colleagues mentioned the WSIS, the World Summit on the Information Society it mandates through the Geneva Plan of Action the main Action Lines. For example in Action Line C8 cultural diversity and identity, linguistic diversity and local content.
And other Action Line C3 access to information and knowledge, C4, capacity building, C7, ICT applications. Now we are at a crossroads of the WSIS, WSIS 2014-2015. There's a lot of debate now on where to go, what's next?
This process of WSIS, this current process is about to finish. The whole world community now is about to make a huge strategic decision on where to go. What UN processes, how the UN processes should continue like the WSIS. Should we continue? Should we merge WSIS with other UN processes such as the MDGs, SDGs and other processes, such as this very process that we have here today, which is the IGF process, the IGF UN process.
So we are now in 2014-2015 at a crossroad of how these UN and other international processes are moving forward.
Now I want to address the second issue, which is what ESCWA is doing in the area of digital Arabic content. As many of you know, ESCWA is United Nations organisation, working or operating in Arabic regions 22 countries 17 are members of ESCWA and we are here to promote the social and economic files in the region and in this area when we speak about -- we have the Knowledge Society and Information Society when we want to promote digital there's the digital economy or information economy or Knowledge Economy so ESCWA is here to promote digital Arabic content as means of promoting the economic and social prosperity of the region.
Let me just share with you some of the activities and outcomes that have been happening by ESCWA for example 2003 there was the Digital Arabic Content Initiative. In 2003 enhancing Arabic content on digital networks. In 2005 digital Arabic content opportunities priorities and strategies. These are some of the -- these are studies, workshops, 2007 virtual workshop. 2010 survey on digital Arabic content, software applications and assessment.
2010 there's a study models for business plans, marketing and multistakeholder partnerships. 2011 mechanisms for community-driven interactive Arabic multimedia content and 2012 status of the digital Arabic content industry in the Arab region. 2013 we produced a leaflet on digital Arabic content industry.
2013 business models for digital Arabic content.
2014 policy note on digital Arabic content strategies.
These are some pictorial views of some of the things that we have produced. Most of our production are bilingual, Arabic-English. Sometimes trilingual with French. Promoting the digital Arabic content industry through incubation. One track we took in the Arab region is we want to make this bridge between producing digital Arabic content and youth and incubation and involving young people, involving crowdsourcing, ordinary citizens, ordinary societies, NGOs, organisations, universities in this process of promoting digital Arabic content.
We have also projects. We have a couple of projects on promoting digital Arabic content. Now we have a project proposal. We envisage to have between half a million and 1 billion dollars in order to move ahead with our efforts in promoting digital Arabic content.
Another thing which is national digital Arabic content, competitions. We are organising some competitions in the Arab world in order to encourage various communities, individuals, and in fact competing for awards and competing for recognition in the area of digital Arabic content.
These are some of our project partners in the region like ADU in Abu Dhabi. The IPARC (phonetic) in Jordan, in Lebanon, Tunis and Tajic (phonetic) in Egypt.
Now I would move to my last part because I'm running out of time. Some recommendations. We have some recommendations for the Government in the area of formulating policies and strategies for digital Arabic content we have noticed most of the Arab countries are putting plans for digital Arabic content separately, alone, in silos, without coordinating with their neighboring Arab countries. Whereas you can see this digital Arabic content endeavor should be dealt with at transnational effort.
The Private Sector also. They have some huge stakes in this. The Private Sector, for them producing digital Arabic content is a money-making thing.
So this is not bad. This is good. We want to make sure that their business model coincides with the promotion and creation of digital Arabic content.
Also Civil Society.
I will stop here. Thank you very much.
>> DR. NOHA ADLY: Okay now I would like to invite Mr. Makane Faye actually before we start I just want to notify that Dr. Wallis has sent us a video of her presentation. So we are going to share with all of us this video after we finish the presentation of Mr. Faye. Mr. Faye is the chief knowledge management section and Digital Library services in the UN Economic Commission for Africa and he has served for over 23 years with the Commission for Africa where he promotes the use of emerging strategies in knowledge management, information systems and library service development.
Now Mr. Faye, the UN echo has been 50 years in existence and they create a vast quantity of information and knowledge in a variety of formats so if you can share with us how can this local content can contribute in the promotion of sustainable development in Africa and what are the efforts that UN ECA has been in promoting the cultural and linguistic diversity.
>> MAKANE FAYE: Thank you Madam Chair person first of all let me thank the Government for giving me the opportunity to be part of this opportunity. And I'm also thanking the panelists for having made brilliant and informative presentations.
Now for the Economic Commission for Africa, as you have said, we have been in existence for the past 50 years. And what we wanted to do is to make sure that all of the information which has been created by ECA through its Member States through working papers, reports, policy briefs, speeches and UN resolutions are digitized and made available. And before we started, Dr. Serageldin knows very well ECA has been working with our secretaries to assist in this effort.
So I think this is a good sign that I'm sitting on the same panel with the BA Director. The objective is to put in place a repository that would provide an online mechanism for collecting, preserving and disseminating digital format the ECA publications.
Since its existence this is how the library was looking before on -- or the information was shared. And I'm sure Dr. Ismail has seen it. Now where we are is to have a repository where everything is being digitized. We started this project in 2007. And the digitization was put in place starting from 2009. As you all know it was to have a systematic collection and dissemination of the ECA property to increase access to our knowledge and also to increase impact on our work.
This was linked to two other activities. What is a portal called access to scientific and socioeconomic knowledge in Africa and the African Virtual Library and Information Network.
The projects were launched in 2011 in one of our governing mechanisms, the Committee on Development information science and technology.
What we did was to first we were using -- we have an online catalog where we imported the metadata. And started putting it in the IR which is being developed using -- we have 13 teams and 21 collections based on our work programme. And then after importing then we look for the full text documents to upload them as digital files.
Now when you go to the IR, we have these possibilities. You can browse by title, date and by committees and selections by subject keywords and also by collections. And you can do a simple search, advanced search, subject search and also we have possibilities of doing some harvesting to link it to other collections.
This is the IR as it looks currently. It's available at the repository.UNECA.org. We have publications online in English and in French covering these 13 themes.
And these are the top countries which are visiting or using the IR.
The top one is the United States. And the there's France and what we have in Africa we have only three countries which are on the top ten. That is Nigeria, South Africa and Ethiopia.
We also have the top ten downloads. The first one was on economic development. And the last one was on statistical development and economic statistics.
The challenges we have are linked to connectivity and bandwidth in Member States. Because you have to access the IR through the Internet. And as you know, some of the countries have very limited bandwidth capability to access the IR from their workplace, especially during working time where they have access problems. And when they go home, of course you have to use your own money to be able to access it. And that creates a problem. And we have also what we call the SR roles, that's the sub regional offices of ECA. We have five of them which are located in five sub regions, including the one in north Africa which is in Morocco.
And some of those sub regional offices also don't have qualified staff who can really use the information and who can process it.
So what we are planning to do is to have a replica at the sub regional offices. Also to have what we call the IR in a box where we would have it put in DVDs and videos to make them available to distribute to research institutions in Africa where they can use it and it can be updated regularly to send it back to them.
The future development is the multilingual capabilities. Currently we have all of the documents, 19,000 are in English and French uploaded. But the searching is still done in English. But we are working currently in making sure that the searching also can be done in French without having any problem. We are also planning to enhance the statistics and we work regularly on also upgrading the system.
So Madam Chair, this is what I had for the time being. And it is available at repository.UNECA.org. Thank you.
>> DR. NOHA ADLY: Thank you very much, Mr. Faye, for sharing with us what's the ECA is presenting in terms of diversity of content and access, as well. And now we are going to be turning to another type of diversity, which is the Biodiversity Heritage Library and here diversity is tackling content locations and culture and also this is a Biodiversity Heritage Library has been an initiative that is promoting the usage of the community participating into the creation of the content.
So we hope that Dr. Wallis is going to share with us how they have addressed different aspects of the different diversities in the BHL and also what was the most rewarding aspects of working with the global community so if we can get the presentation of Dr. Wallis.
>> DR. ELYCIA WALLIS: On the 2nd of September 2014. Thank you very much for watching me. I am sorry I could not actually meet you all in Turkey. I'm glad for the technology that allows me to do that today I would like to talk about the project of the Biodiversity Heritage Library this project seeks to digitize and provide full text access. The taxonomy but it now contains much more than that.
The Biodiversity Heritage Library was started in 2006 as a consortium of libraries in the United States and the United Kingdom. Since that time we reach out to communities around the world. We are with China, India, Brazil, South Africa and Australia and most recently now with Kenya and now in Singapore.
Each operates with an assigned Government structure and objectives. But the overall project goal is for all.
The goal is to focus on full text scanning. This provides access to published books there are now over 44 million pages available through the central Web site and still more in separate Web sites.
Recently we are digitizing the project focused on how to copyright and how to seek permission as well.
There's access to several Web sites. This image shows the centre for the Web site. This is initially uploaded and is housed on the Internet from there. These individual things can also be queried. The actual Web site contains graphs and images contained within them. You can select pages. Or download a paper.
For example this is from China which has come from the colleagues at the China technology of sciences. The process for digitizing text as well as providing tools for Chinese researchers.
Similarly, they provide access to content in 20 different languages.
This provides access to some text in Arabic this is done by the colleagues in Alexandria. Particularly impressive is the work that these colleagues initiate through a process in that policy.
This has been successful.
You can download it. It's a digital process.
Most recently the digital community was interested to know -- most recently they have started to implement digitized copies such as field notes. This puts us in new territory. Often these types of texts are handwritten and they have access and require different methodology.
Field notes are copied by transcriptionists. Transcription is very valuable for research materials.
A transcription is a time sensitive task not easy to automate. A number of sites seek assistance from volunteers.
One such site is shown here. It's called DigiVol and run by a national Board of Sciences aggregator in Australia called the Atlas of Living Australia in collaboration with the Australian museum. Volunteers working on the site are very highly motivated and this group of people can assist in making knowledge available through powerful but simple Internet tools.
The Biodiversity Heritage Library has achieved many successes but still has challenges. Principle amongst these is achieving a sustainable funding model that can continue into the future.
There are also technical challenges in the workflow of digitizing and uploading literature.
Synchronizing content and metadata between the different actual Web sites continues to require creative thought.
Then there is the issue of how to attract new partners. How to best serve the existing ones and how to address geographic gaps where there are no doubt excellent libraries that might be willing to participate.
And there is the very real and challenging issue of copyright compliance even for what seems like old literature published say in the 1950s or 1960s.
But despite challenges as there are for every project, the real strength of the project is the many countries, cultures and languages it has brought together. Images on this slide show our global collaborators meeting in Germany, Morocco and in Australia this year.
And I would like to acknowledge Dr. Adly shown on the top slide for her kind invitation to me to join you today.
A project such as this requires global and local groups to work together to achieve a shared goal. A global collaborative project such as BHL is possible to achieve and the social sharing as well as innovative technology allows us to do that. Thank you very much for your time today.
>> DR. NOHA ADLY: Well, thank you very much, I think she can hear us and she can see us. The problem is we cannot hear her but we have heard her through the video and thank you very much for the presentation. And actually for showing how the BHL really is an example of using the crowdsourcing in generating a very rich encyclopedia and very enriching and very useful for both public and research. I think we are beyond our time. We have past 10:30. However I would like to open the floor if there are questions from the floor or actually from the panelists that they would like to raise to other panelists.
So we can take maybe a couple of minutes for any questions or comments.
>> ISMAIL SERAGELDIN: I really have a very small pragmatic comment and this is that we should find a way of posting all of our e-mails so that upon reflection we can follow up with each other on some of these things.
Secondly, if we can also post the URLs for all of the examples that have been used by each presentation, I think that would be useful for us to be able to go back and visit these sites as we go along. This is just a logistical point which I think will enable us to have more interaction after the event.
Second broad issue that I think is important and that has not been touched upon but that my colleague to my left and I were just mentioning, is the difference between languages and the dialects and this is a very major issue.
So for example, the corpus of Arabic, we have 18 Arabic sources, 18 countries. Egypt for example is only 13% of the total of 100 million words that we are using.
And that's because there are differences.
Even though we try to use more than classical Arabic, there are differences in words. For example, there's nothing wrong with them. They both are correct linguistically and everything else. But they are different and style. Particularly we need that recognition. And in UNESCO they have done a lot of work on a lot of the local languages and there's been especially also including the non-written languages in the 6,000 they have all of these variations but when we talk about the Internet it seems to me that there's a big policy issue about the extent to which we will consolidate around basic family structures.
So for example in the Arabic world, we are trying to consult around what we call modern standard Arabic as opposed to trying to say this is Moroccan dialect and this is so on dialect.
And I wonder whether our colleague from UNESCO can tell us whether there are any such efforts outside of the Arab world.
>> INDRAJIT BANERJEE: Well that's an extremely interesting point because I think there are two sides to the debate. Those who believe in the name of linguistic diversity to maintain all of the dialects so that the more the merrier. On the other hand there are those who believe that if you don't have some kind of standardisation, then communication is very difficult between different groups so I don't have any personal position on this. I think UNESCO is an extremely open minded institution we have had lots of discussions within the Arab Group and even within the Arab Group there are difference of opinions so for the moment we are just encouraging people to develop local content assisting whatever we can to the development of local content and putting it online. So that's where we stand. But I think the question you raise is a very fundamental one.
>> DR. HAIDAR FRAIHAT: To shed light on how we can deal with this monster that's called digital content creation in a specific language. I think whenever we do that, we have to keep in our minds that ecosystem. Because here we have UN organisations. We have libraries. We have governments. We have NGOs. where is the centre of all of this? Who is the leader? Who is the focal point? Is there a focal point for content creation or is it a matter of the ecosystem where everyone, the outward of some agency is the input of another agency.
I think this notion is very important. Because if we think of a centre agency or category and everyone is circling around it that's a model and if we think that every one of us is on this network of things, for me I think this is the right thing because everyone is a specialized agency for example the Private Sector they worry about how to make money out of content creation, content distribution, content dissemination, content reformulation and so on. The United Nations organisations in general we are here to help countries, help governments, policymakers to bring best practices, standards and so on.
We have the libraries, the repositories, whether it's a Digital Library, whether it's ordinary classical library, I think they have a lot of stakes in making sure that the wisdom is there, the cultural is there, the culture is there it's accessible by everyone and so on so the point I want to make I think it's totally advisable if we look at content creation, digital content, digital localized content, in an ecosystem.
For example, the ICANN -- someone mentioned the ICANN, the other Internet administration organisations. Also they have different kind of stakes in content creation.
So I really advise the ecosystem thing. Thank you.
>> MAKANE FAYE: Yes, on the question by Dr. Ismail, yes, I think it's important to keep also the languages which are not widely available. And in the African continent actually we have thousands of local languages which exist in this context the African language is put in place by African Union by UNESCO. And this academia is promoting the use and widely acceptance of the African languages which are being roused because of the calling out domination of English and French. So we have put in place some systems to have a repository