Saturday, June 23, 2007

Assorted statistics

OmegaWiki has reached 30.000 DefinedMeanings, we have some 258.000 Expressions. And as some people do not stop telling me there are about 28.000 Expressions in the largest language and this means that there is a close relation between the number of Expressions in a language and the number of concepts. This is said to indicate that OmegaWiki should be able to scale. :)

The Webaliser statistics have given me a surprise; there is now more info to be found. What is nice to see is that there is now a breakdown in where the traffic comes from. As we have a lot of traffic from crawlers, it would be good to exclude crawlders in order to see where interested PEOPLE come from. Erik told me that the new features are probably due to the upgrade of this week.

Malafaya is now the fourth person who has taken an interest in our statistics. He has worked on the reliability of the statistics of collections. His first effort improved the numbers, his second stab at it improved the performance of the queries a lot.

Finally the Alexa statistics have improved a lot for no apparent reason. We have had times when we were not ranked at all or we could be found above the 800.000 range.. Now we are for a few days hovering around the 368.500 mark. Still not impressive but it looks much better. When you compare the Alexa numbers with our Webaliser numbers, the only thing that can be said is that for Alexa the numbers are statistically not really valid.. This will improve as our community grows.


Monday, June 18, 2007

Server upgraded; dataset support online

The server has been upgraded to Debian etch. This gives us PHP 5.2.0, which is needed to run the latest version of OmegaWiki. (In the process, we exchanged our hand-compiled PHP and Apache binaries with distribution packages.) OmegaWiki itself has also been upgraded. The current version of the code has support for so-called "data-sets".

A data-set is essentially an instance of OmegaWiki which can contain a completely separate set of DefinedMeanings and associated data. This is useful for importing authoritative sources which may either not yet be fully editable, or which are meant to be retained alongside an editable version. It also allows us to showcase imported databases, to convince organizations that own the data to release it freely and make it fully editable.

The current version already supports mapping DefinedMeanings across data-sets. So you can indicate that concept A in data-set 1 is the same as concept B in data-set 2. However, it does not yet support copying data from one data-set to another, which is what we are working on right now (some hints to it are already in the code).

Currently OmegaWiki has a single data-set only. We are considering to set up some example data-sets to let the user community play with this new functionality.

A word of the day

Like so many other resources that are lexical in nature, OmegaWiki has a word of the day. Our word of the day is not prepared in advance and we leave it to the community to create one. I am always relieved when there is actually a word of the day when I wake up.

Today's word of the day is interesting for many reasons. The word is wheat. There are several issues to consider.
  • It is marked as "English (United States)". There is however no "English (United Kingdom)" and as I cannot find this alternate, it should be just "English".
  • The definition has not been translated into English. This is very much optional, but it makes it so much easier to translate the definition in yet another language
  • In the definition, wheat is said to be part of the family ''Graminacee" of the genus "Triticum". According to Wikipedia the family should be "Poaceae".
The big thing here is that taxonomy while a science, is not exact. There is no such thing as a name that will be true for forever. With some regularity it is found that names need to be changed. These revisions may mean that species that are known to the public are no longer to be that species, they can be split up or lumped together.

The issue here is that without being able to reference to both families that are grassy, it is hard to appreciate this definition. This word of the day clearly shows why there is a need for a dictionary of life, a dictionary that explains all these names and shows the relations between the different validly published taxonomical names.


Sunday, June 17, 2007

OmegaWiki only a translation dictionary ?

There is some misinformation about OmegaWiki, it is said for instance that OmegaWiki is only a translation dictionary. There are also people who do not consider OmegaWiki as relevant because it is not a Wikimedia Foundation project.

It is for the people that have not looked at OmegaWiki for a long time or have not really looked well that we want to state the obvious; OmegaWiki is not only but also a translation dictionary. When you look at the number of expressions per language, you will find that we have almost 30.000 DefinedMeanings, the reason why we have 11.000 more English Expressions then what we have for any other language is because we have collections that are at still mostly English. Collections like the ISO-DIS-639-6 are relevant because of the information that is included in the data.

OmegaWiki is becoming relevant because our data is starting to be used outside our project as well. Positano News uses OmegaWiki data for "assisted reading", this helps people to understand terminology that is in an Italian news article. It does give you definitions and translations.

It may be that the current possibilities at OmegaWiki are not immediately obvious; there are many DefinedMeanings that do not have any annotation. An annotation can identify the part of speech for a word, it can provide you with a sample sentence or how to hyphenate a word. We want to include links to other websites; we want to link to Wikipedia articles in order to make it convenient to our users to find good encyclopaedic information.

OmegaWiki is not feature complete. We want to add many more features, but our first priority is to make sure that it works well and that the features that matter most are included. We need to improve on our performance and, we need to make sure that we provide a framework that facilitates collaboration with other organisations.

The Wikimedia Foundation is one organisation that we really want to collaborate with. On a personal level we have been involved and we want to extend this by collaborating on an organisational level as well. This often repeated intention may be one reason why certain people are so apprehensive about OmegaWiki; we wanted it to be a WMF project, it is not a WMF project but we still see room for doing good together.


Saturday, June 16, 2007

If you love somebody set them free

On OmegaWiki we have many sysops. Giving people the abilities that comes with the sysop flag is what has prevented a lot of vandalism and spam. We are happy and grateful that this has worked out so well for us. As a consequence, we do not have the eternal admins versus the editors controversy, our admins do not have to do anything; they are kindly requested to do good and amazingly they do.

With some sadness, we learned that a Wiktionary admin is leaving Wiktionary; he was told to be more active or else. There is a silver lining in that this guy announced to become more active on OmegaWiki. Obviously every project makes his bed and lies in it. We have chosen to have as little bureaucracy as possible. The question is very much; how is it going to scale.

OmegaWiki will expand by including "Wikis for Professionals". Each will include the terminology for a specific domain extended with specific information and functionality. With more people signing up to such a community, it may acquire its own rules. These rules should fit in the larger community that is OmegaWiki. What I expect is that often the unwritten rules will be the more important ones. In a Wiki for Professionals, people will be interested when the project is relevant. When this proves to be demonstrably so, it may become important to be identifiable to gain the benefits of the association with the project. The flip side of the coin is that negative behaviour can damage a professional reputation.

In a year, the community of OmegaWiki will be different. We work hard to provide it with an environment that will enable it to do good. At this stage it is still very much basic functionality that we are building. There is much new functionality and data waiting to go live. When it has, we will love to hear what is good and what could be better. We will love it when people help us morph our functionality and make our environment more relevant.

The only thing that we will insist on is that things can coexist and people collaborate, in that way we set not only the data free but also the imagination free, we will love it and we will set them free.


Sunday, June 03, 2007

250.000 expressions

Today we reached the milestone of 250.000 Expressions at OmegaWiki. It is special because most of this data has been entered by hand. We find that when people get enthused by the concept of OmegaWiki, they do make a difference for the language that they champion.

We have people who have a particular interest in Georgian, Khmer and Spanish, it shows in the statistics as these languages grow much faster than the others.

Aveyron is the 250.000th entry in OmegaWiki and, it is only fitting that Ascánder was the person adding it. Ascander is one of the most valuable contributors to OmegaWiki. Aveyron is part of a project to include information from the ISO-3166-2. In this standard it is detailed in what way countries are subdivided. It does not state that Italy has provinces, the USA has states or that Germany has Bundeslander. It does give the names of these entities.

So, OmegaWiki is evolving nicely. We hope that in line with how Wikis evolve, we will have an easier time to get 250.000 more Expressions.