Sunday, December 21, 2008

Words I do not know

I read and write a lot of English. Every now and again I come across a word, an idiom I do not know. When I have the time I add them to OmegaWiki. When I look at the ones I added today, fecklessness, valedictory, fricassee and enunciate, I wonder if these are the kind of words other people are looking for as well.

Friday, November 28, 2008


Some words that you look up in a dictionary, have a particular usage associated with it that is not always apparant. The word Balkans for instance is always preceded with "the" when it is used as a noun. The same is true for the Ivory Coast..

This new type of annotation was requested because people wanted to indicate the difference in usage between words that were otherwise synonymous.


Wednesday, November 05, 2008

Spanish at 30.000 expressions

Today I am proud to announce that Ascander added the thirty thousandth Spanish expression in OmegaWiki.

Sunday, November 02, 2008

The beauty of words

Last Friday [[nl:Afshin Afkari]], presented his Dutch Persian idiom dictionary. The publisher, Amsterdam University Press organised a dialogue with Ashin and Hugo Brandt Corstius in Spui27 in Amsterdam titled "the beauty of words". Going to presentations like this is something that I do rarely. It was however great fun particularly because of the people that you meet at such occasions.

When you read a book like this, you can enjoy a great print. It really looks good. In a book that contains both Latin script and the Perso-Arabic script, it takes more effort to achieve this. One of the things that Afshin included was the transscription of the Persia texts in Eurofarsi. This seems to me a smart move as it a great aid for people who learn Persian. Eurofarsi I am told is used quite a lot by people on the Internet and by people who SMS.

The exposure to Afshin's book made me think on how I would include idiom in OmegaWiki. When you have an idiom in a language, it has a meaning. The question is how do you deal with translations. When you think of it, there are two issues: there is idiom with the same meaning and, there is the need for a literal translation of the text.

It seems to me that you should treat idiom different from normal lexical content. Leaving it a DefinedMeaning without translations but with annotations to equivalent idiom, a defintion of the meaning and literal translations of the idiom itself. Idiom often has a few key words, eg in "blood is thicker then water" you would like to refer to both blood and water. Controry to what is usual in OmegaWiki, you want to refer to these concepts in the same language as the idiom.

The question is, how to support this in OmegaWiki mark II.

Thursday, October 30, 2008

Fourtythousand DefinedMeanings

Having 40.000 DefinedMeanings is a nice milestone. The graphic shows how the number of expressions per language tails off.

Congratulations to all the people who contributed to this success. :)

Tuesday, October 28, 2008

French has over 20K expressions

I expect that with 20k French starts to become relevant as a resource in OmegaWiki. The good news for our users is the amount of translations a concepts is connected to. Currently the ratio is 9,0402 Expressions per DefinedMeaning.

A big thank you to everyone who contributed to the French success :)

Tuesday, October 14, 2008

MySQL Workbench Beta program

SUN, is working on improving the MySQL workbench designer. We have been using workbench for OmegaWiki mark II and have been asking questions and making suggestions. It is with some satisfaction, that we find that we have been invited to the MySQL workbench beta program.

Currently our software uses some 30 business rules and these amount to some 6000 lines of code. They model a complex dynamic behaviour that is responsible for most on the "work behind the scenes". These triggers are currently not modelled by workbench, that can contain them, but not describe their behaviour from the analyst's point of view.

We are happy and satisfied being part of this program.

Monday, October 13, 2008

A Commons that supports multiple languages

Commons is great, it is the biggest resource of freely usable media files, some 3.363.734 of them. They are a great resource when you are looking for pictures and when you know English. If you do not know English, it might as well not exist.

Given that the Wikimedia Foundation cares about its educational value for everyone, it makes sense for Commons to be used by people who read and write other languages as well. This dream of supporting all the other people as well is an old one; I wrote about it as early as May 2005.

With support of the Digitale Pioniers, we have been enabled to create a proof of concept project. This project does demonstrate that we can do this. The languages that are currently supported are only a few but sufficient to demonstrate the principles.

The issue is that it is for the Wikimedia Foundation to show an interest. There is a growing group of people who have seen that it can do. When I give presentations about this people indicate that this is a "must have" feature. I will talk again at the Wikimedia Conferentie NL and I hope to discuss with the WMF soon what it takes to support multiple languages at Commons but more importantly if it wants to.

Wednesday, September 17, 2008

Wikiprofessional is doing Portuguese

One function of Wikiprofessional that I really like is the Concept Web Linker. If you have not played with it, you should. What I am really happy with is that the Concept Web Linker is getting into other languages. I have an example of this..

This is the screen dump.

Thursday, September 11, 2008

Commons but now multi lingual

This is the category "Felis silvestris catus" as it is on Commons with a twist. This screen makes best sense when you can read Dutch.. Obviously, the English text is still there but that does only helps when you understand that language.



Thursday, August 28, 2008

OmegaWiki goes Squeak

OmegaWiki was implemented in MediaWiki as an extension during its first iteration. At that time this was a great decision. It provided us with a lot of great functionality and it was the environment that we knew. MediaWiki is a wonderful collaborative application. In OmegaWiki mark II we are looking for other things; this has led to making OmegaWiki independent from MediaWiki.

OmegaWiki intended to do everything in one database. This meant that it was problematic to use our data for other purposes. We also had a situation where different applications wanted their specific data included in the database and needed control of the data involved. This could not be done in the first iteration of OmegaWiki so we had to rethink our ways.

As data may be connected and often will be shared, a peering model is required. Data will be used by many applications and consequently the data needs to be provided in a way that allows for many applications. For the data we will have one interface that uses a standard XML interface to provide the data. This will allow for many applications to use the same data and it will prevent the mixing of data and user interface elements as we saw in the past.

There will no longer be only one database; there will be many. The central or “global” database will provide basic information that will be CC-by licensed. This will allow anybody to have their own “regional” or “local” data and refer to shared concepts for instance for sharing or mapping purposes.

Regional and local databases can be licensed and maintained in a manner that makes sense to the people involved. They can include data that is not really acceptable from a pure linguistic point of view, for instance MALARIA as a synonym for malaria. They can include all kinds of relations between concepts, relations that may be really specialised or that require particular validations before they are published.

Another application that we have always wanted to give our data was was the OLPC and equivalent networks. This means that our database has to be able to function stand alone and be synchronised and share improved data when a connection becomes available.

As a consequence of all these considerations, we have been looking for the best technology that will serve our purpose; MySQL 5.1 will provide us with new functionality that makes an important difference. Squeak is a programming platform that we think will provide us with the tools to build the rich environment that we dream of having.

As the OLPC project also uses Squeak, it will allow us to bring our information to this great educational project, and in return we hope that people will find OmegaWiki an environment to contribute their Squeak work to and help us build dictionaries in the many languages spoken and written where the OLPC will become available.


Tuesday, June 17, 2008

Fast and furious

I just learned that OmegaWiki mark II has been updated; it now allows for multiple statements. Bèrto indicated that he will update the documentation to reflect this.. The example code I have does not get posted because the Blogger software wants to interpret it ...

I am sure that it will find its place in the documentation ... So, it is for you to RTFM :)

OmegaWiki mark II

It has been relatively silent around OmegaWiki, this silence however is very deceptive. Much hard work has gone into preparing for a new codebase for OmegaWiki. The new code has to deliver new functionality in order to justify the huge investment in time and money.

Why change..
  • The existing user interface and the database routines are very much etwined, we want to separate them
  • We want to provide services based on the OmegaWiki data, we will provide an XML interface into the data
  • The underlying technology has changed a lot; we are using the bleeding edge of the MySQL database
  • Our data can be used for applications; we will provide a way to separate data that is of general interest and data that is not
  • For some applications it is really helpful when the data can be used in an off-line environment; we will provide a way of synchronising databases
  • There is more ...
The first data in the OmegaWiki Mark-II environment is now available. It is converted data from OmegaWiki; it is the DefinedMeanings with a SynTrans record in English.. You can find it here.

At the bottom you find a document explaining about the API; it is written in the Open Office format..

Enjoy, have fun and tell us what you think of it. The data is experimental so we will replace it with more data at a later time.

Sunday, June 08, 2008

Internationalisation and localisation

When you write software, when the software is to be used by people who speak many languages, internationalisation is a key requirement. It is the precursor to localisation; the changes made by the localisers to support their language.

MediaWiki is really good at localisation and, it is being perfected all the time. When software is written and when internationalisation / localisation is not considered from the start, it is quite a job to get this right.

The OmegaWiki Vocabulary Trainer needed internationalisation / localisation badly. The first issue with the software; it was not German. The software was paid by the University of Bamberg to be used by its students. These students are expected to know German and this trainer is a tool to aid them to learn languages. OmegaWiki is very much dedicated to making information available in many languages and not considering internationalisation in its associated software is ... odd.

It is with relief that I can now announce that the Vocabulary trainer is supported in Betawiki. I am grateful to the kind developers at Betawiki who made this possible. You can localise for your language here ...

Tuesday, June 03, 2008

Getting a better look and feel ..

The vocabulary trainer was introduced... It looked awful; it now has been improved considerably. Like we wrote last time, it is open source and it is being worked on. One of the things on the "must have" details .... localisation :)

Thursday, May 29, 2008

Publish early, publish often ...

For Open Source software, one of the mantras is to publish early and publish often. In this spirit I am happy to announce the first version of a vocabulary trainer on OmegaWiki.

This software uses the content of OmegaWiki; so the quality of the exercise is also determined by the amount of terminology that is contained in OmegaWiki.. The current functionality is not yet feature complete.. One of the things we want to do with this software is have a list of words and phrases that are good to know when you go abroad.. Wikimania 2008 anyone ??

NB the vocabulary trainer can be accessed from the OmegaWiki main page :)


Wednesday, April 16, 2008


Georgian is a language that started with no content in OmegaWiki. It is spoken by some 4.4 million people mainly in Georgia, Turkey, Iran and Russia. It is written in the Georgian alphabet.

Statistics are in and off themselves not that relevant but they do allow you to tell a story. OmegaWiki started with the content of GEMET, the GEneral Multilingual Environmental Thesaurus, the languages that are part of this resource have a head start in size.

Today thanks to the hard work of Sopho, Georgian is the first language that grew bigger then one of the languages supported by Gemet. The OmegaWiki statistics show that Japanese might be the next language to grow bigger then Slovenian..

Because of the beautiful characters, Georgian is my favourite example of showing the value of the localisation in OmegaWiki. It is really special to see the same content optimised in such a way. :)

Saturday, April 05, 2008

Unicode 5.1

Today I learned that Unicode 5.1 has been released. The information that I received informs me that one major feature will be of particular relevance to Japanese, Chinese and Korean texts by enabling ideographic variation sequences. The linebreaking for Polish and Portuguese hyphenation has been improved. The Indic languages will be happy with improved text segmentation algorithm.

There are 1624 new encoded characters, this includes characters required for Malayam and Myanmar but there are also new characters for the Latin script. New is support for the Cham, Lepcha, Ol Chiki, Rejang, Saurashtra, Sundanese, and Vai scripts.

For the techies, the collation algorithms have been updated to include all the new characters. This has also an effect on contractions like the ch in the Slovak language.

Many of these things have an effect on languages supported in Wikimedia projects. My question is when will we have support for this. Is this a function of the MediaWiki / PHP code and is it also a function of the browser ??

Tuesday, March 18, 2008

OpenStreetMap terminology

OpenStreetMap for those who do not know the project, is a free editable map of the whole world. Its data is freely licensed, it is build by volunteers and it is very much a work in progress.
OpenStreetMap has its own terminology, and given its origin it is British. The maps can be quite good providing you with sufficient information to plan your route.. There are routes for pubcrawls and other innovations :)

As OpenStreetMap intends to provide a map of the world, the people that make these maps have to literally put themselves on the map. In the Netherlands we have been blessed because maps have been made available to the project. A friend of mine is not yet on the map..
It is clear that the terminology used in Italy is not the same as in the UK or the Netherlands. It makes sense for an Italian to have Italian terminology available to him and Dutch would do me nicely.

At a meeting of the Digitale Pioniers, I met people of OpenStreetMap and we agreed to include their terminology in OmegaWiki. I have now entered the words that have the key "highway" and invite you all to come to OmegaWiki to add translations in your language.

Special in this data is that i have added the definitions provided by OpenStreetMap as alternate definitions as well. In this way they have been marked as definitions provided by OpenStreetMap.

All the OpenStreetMap terminology in OmegaWiki can be found here.

Tuesday, March 11, 2008

Less is more

There are two types of content in OmegaWiki; there are the standard MediaWiki namespaces with the standard MediaWiki content and there is the OmegaWiki specific content. The specific content contain database records, they are the DefinedMeanings and Expressions.

As they are handled completely differently, it means that the functionality available to these types of data is different as well. Particularly the "move" functionality has given us problems in the past. This broken functionality has now been removed from the screens.

Firefox 3 beta 4 is significantly faster

OmegaWiki is a web application that requires a lot from a system. I am really happy to report that the latest beta of Firefox gives me a significantly better performance on the same hardware. I reported in the past that the Firefox beta did a good job for me, but this time the performance is noticeably faster on a page like Nederland.

Firefox provides cutting edge technology and really makes a big difference to me. Now when you want OmegaWiki to perform even better, you can choose to move to this latest software. The other good news is that the spell checkers that are relevant to me are now available as well...

Saturday, March 01, 2008


OmegaWiki has a technical problem; there are certain records that have problems and that crash the database. There are solutions to this problem and some are being tested at the moment.

This week I was in Bamberg at the Otto-Friedrich University, and discussed this with Martin Mai. He has now build a monitor that checks if OmegaWiki is still alive. This works fine. We now have permission for the Bamberg Nagios service to run a script when OmegaWiki is no longer alive.

This will solve one of my biggest worries which is the availability of the OmegaWiki service.

Wednesday, January 16, 2008

Bounties for the localisation of MediaWiki

The "Stichting Open Progress" is happy to announce that it has received a grant from Hivos, to improve the localisation of MediaWiki. Open Progress is going to offer a bounty of up to 200 EURO for the full localisation for a language. Given the activity of Hivos, a Dutch NGO, a bounty will only be available for languages in Asia, Africa and Latin America that have a sizable number of speakers.

With this project we hope to achieve that MediaWiki can indeed claim to be one of the best Open Source projects that provides great localisation for many many languages out of the box. It will improve the usability not only for the WMF projects in that language but also for projects like Wikimedia Commons, Wikieducator, Wikihow, OmegaWiki .. the list goes on ..

The budget we have is substantial but limited. We will be sadly happy when we have to announce that we have ran out of money. Sad because we want to localise more languages, happy because so many languages will have been improved.

For the precise details of the project I refer to the details on Betawiki.

NB The amounts are inclusive of the money transfer costs and as these can be substantial, we offer some alternatives. In the past I have proposed a scheme called "Donations, putting your money where your mouth is". In this scheme you choose to get paid or donate the money for one of the projects that we advertise under this scheme. Another way of getting the money paid is when people in a country agree to work together and have us pay the money together as well. This could for instance work in the case of Wikimedia India...


Tuesday, January 01, 2008

2008, International Year of Languages

The new year, 2008 has been designated the International Year of Languages by UNESCO. Countries and organisations are invited to participate, and I think that what we do in our projects qualifies us as participants. Our projects are relevant for many languages, we welcome new languages and we provide help and infrastructure to make it a success. Our communities are knowledge societies in which everyone can participate and benefit. We promote universal access to information and ensure in this way the use of an increasing number of languages.

Siebrand wrote an overview of the localisation of MediaWiki. Real support is provided to some 170 languages but only 47 have a minimal localisation. In a way it mirrors our projects; our projects do great in some languages while at the other end language versions are closed because there is not enough of a community that supports them.

The Wikimedia Foundation is becoming more mature; we aim to ensure that all our projects do well. Barriers to entry have been in place for new languages and it has led to improved localisation of MediaWiki. When new projects are finally approved, they are already of a size both in articles and participants that there is less need for anxiety for their future.

When we are to participate in the Year of Languages and continue to do what we do well and improve where we are weak, our projects will prove to be credible participants in this Year of Languages. Participating will give our Wiki way more credibility and it will give us access to people and organisations that can help us in fulfilling our goal; sharing the sum of all knowledge with every single human being.