Sunday, December 30, 2007

Numbers in a translation dictionary

In the Wikimedia Foundation, there are the inclusionists and the exclusionists. Some are of the opinion that certain topics should not be included in Wikipedia while others do. The most brilliant example is the inclusion of all the busstops in Japan. Someone took the effort to describe them and people find them actually useful.

Some people are enamoured by constructed languages and spend a lot of time making such languages their own. I personally have had dealings with at least three people that speak Volapük, and I know people that go to congresses because they meet people who speak Esperanto. Many constructed languages have more speakers than many natural languages (that are not yet extinct).

For exclusionists it is not palatable when constructed languages do well. There are always "good" reasons why those others need to be excluded. Hidden in the discussion about the "radical cleanup of the Volapük Wikipedia" is a discussion about the inclusion of numerals like 588 in the Limburgian Wiktionary and as you can imagine "it is not good".

In a translation dictionary there are reasons to include numbers. The point is that they are not written the same in all scripts. OmegaWiki has its fair share of numbers, and it did not address this issue.

By adding a new class, we now allow the representation of Arab and Roman numbers, and hundred has now as an annotation both 100 and C. In this way we do not have a separate page for each numerical representation.


Tuesday, December 18, 2007

Languages ...

Given a fixed point and you can move the world. In many ways, your language provides you with the tools to wrap your mind around the world, express its essence in a way that may be understood by the people you communicate with and help you to shape the world as you know it.

Language is both individual and shared. My English has been shaped by my schooling in the Netherlands, my stay in the United Kingdom and the many times I expressed myself in e-mail, articles, presentations and when skyping. Typically I get it right but a text can be understood by some and misunderstood by others. It is my English but to function it has to be expressed in a way that is shared with others.

For some subjects I prefer my native language, for others I prefer English. To communicate, the language used must be sufficiently shared by everyone involved. The language must be received; when I am surrounded by a vacuum, nobody will hear me talk. To read this blog, you either have access to a computer or someone must print it out for you.

A language lives when people use it, when it is part of a distinct community, a distinct culture. When the boundaries around such a community or culture disappear 0r change, the language either morphs or it dies. To understand history, you have to understand its artefacts and its language. Many languages die and died and with it we lose the history, the culture of the people that spoke that language. They may leave their literature, inscriptions and when enough is left, we may understand what is says. The trick will be to understand it as it was meant when the language, the culture was alive.

Saturday, December 15, 2007

Whe needs birthPlace

With some regularity I try to better understand Semantic Web and associated subjects. I find it hard going but also a compulsive subject. When you express the relation "Johan Cruijff" "birthPlace" "Amsterdam", it is understandable to you as a reader but for humans it should read like "Johan Cruijf was born in Amsterdam" or "Johan Cruijf werd geboren in Amsterdam" .. This magical statement "birthPlace" can be interpreted when you know your English otherwise it is truly for machines only.

OmegaWiki does express relations, you will find for instance that Amsterdam is the capital of the Netherlands. In essence it is expressed as a triple, but it is expressed in natural language and depending on the existence of a translation, you will read the relation in the language selected as your user preference.

How to combine what we do and what happens elsewhere, my latest idea is based in the RDF tag; "birthPlace". It is a construct that obviously needs a natural language equivalent and this is what OmegaWiki can provide. A method is needed to connect the two. In order to function, birthPlace has a very precise definition and this definition must be part of a collection of such definitions. These labels need to be linked to OmegaWiki DefinedMeanings as the identifier for an OmegaWiki collection.

To make this useful, an external application needs to call a function that provides the translation to a specified language. How to combine this with the notion of an URN I have not figured out yet.


Thursday, December 13, 2007

Eastern Yiddish

Eastern Yiddish, is one of the two varieties of Yiddish that have been recognised as languages in their own right in the ISO-639-3 (ydd). in OmegaWiki, we now have our first 213 Expressions in this language and I am impressed with the amount of work that has gone into it; most have annotations indicating hyphenation and the pronunciation using IPA notation.

Eastern Yiddish is not one of the languages supported by MediaWiki, and the mechanism for showing localised content is connected to the language selected in the "User Preferences". I have been given some help from Siebrand what files need to be changed and added. Kim helped me with doing it for the first time and now the first localisation is visible for Eastern Yiddish.

The MediaWiki localisation itself uses Yiddish as the fall back language so the experience is pretty good for now. What Siebrand indicated is that is is possible to include languages like Eastern Yiddish in the BetaWiki. This would create stubs that are of benefit to OmegaWiki. It would prepare for the moment when people start localising in earnest.

I think it would be a good thing, but I am interested to learn what other people think.


Monday, December 03, 2007

Supporting American English

American- or British English are two variations of the English language. They have substantial differences. They are sufficiently the same and are unlikely be mistaken to be separate languages.

In OmegaWiki, it has been possible to add entries for English; this meant there is no difference between written the different versions of English or you had to specify both versions. Issues like this exist for other languages like Serbian and Mandarin as well.

In the OmegaWiki user interface, languages are considered ISO-639 entities. When a DefinedMeaning for a language is part of the appropriate collection, we use the translations in our user interface. The problem is that all these linguistic entities are needed now and that they are created to make OmegaWiki work.

For the ISO-639-6 there will be issues as the codes we make, using the RFC 4646 methodology, will be replaced. It will also be interesting to learn how in the end everything will be merged together.

In the mean time we now support localisation for these linguistic entities.


Sunday, December 02, 2007

Sinterklaas present

In the Netherlands we traditionally do not get presents with Christmas. For us Sinterklaas is celebrated on the fifth or sixth of December. In my family everyone no longer believes in Sinterklaas and consequently we can celebrate it on a more convenient moment like in a weekend.

I have had a wonderful Sinterklaas, and I do want to tell you about the present that Kipcool and Kim gave me. Kipcool wrote this wonderful functionality that show you what classes we have in OmegaWiki and, how many translations we have in your language.

With the new functionality you will see the concepts that are translated in your language. When you check out a concept, you will even find what attributes are available in your language. I have found it to be really addictive. :)