ISAT2008 / NACE rev2 / ISIC4 mappings
For a while now my Icelandic registered companies scraper has been providing Opencorporates with data on Icelandic companies.
The data it’s pulling in isn´t very detailed, but one of the items is a URL that points to the Icelandic tax offices’s XML document on each company. Here is a sample: http://rsk.is/ws/other/enterprisereg/6710112010.xml
One of the items in the XML is a ISAT2008 code (and sometimes a, discontinued, ISAT95 code). This code is the equivalent of the NACE rev2 code. NACE is the Statistical Classification of Economic Activities in the European Community (in french: Nomenclature statistique des activités économiques dans la Communauté européenne, hence the acronym). Basically it’s a code to classify businesses. A full list of NACE rev2 codes is here. You can read upon ISAT2008 codes here (in Icelandic) and search them here (also in Icelandic).
I wanted to add this ISAT2008 code to my scraper and after some digging found out that the ISAT2008 code is simply the Icelandic version of NACE rev2, with some minor additions. For instance the ISAT2008 code 01.13 is the same as the NACE rev2 01.13 code (and stands for “Growing of vegetables and melons, roots and tubers”). The NACE codes do not have any additional codes in that specific category (that is the NACE numbers are 4 digits). The ISAT2008 classification does on the other hand have additional classifications, like 01.13.2 (“The growing of potatoes”).
So what we’ve discovered is that a 4 digit ISAT2008 code is equal to a 4 digit NACE rev2 code. All is good.
After a short discussion with Chris Taggart at Opencorporates I discovered that Opencorporates has decided to not use the NACE rev2 codes, but another standard, the ISIC rev4 from the United Nations.
The UN, fortunately, makes available files with mappings between these two systems. So I imported that into Scraperwiki here. Then I imported files with the ISAT2008 definitions in Icelandic and English and mapped the 4 digit codes to ISIC4 here.
All that is left to do at this point is update the database with roughly 80.000 Icelandic companies and add to it the ISAT2008 code and the corresponding ISIC4 code.
Excellent. Problem solved. Opencorporates will now get a ISAT2008 code and a ISIC4 code from my scraper.
That means, for instance, that the Opencorporates entry for Landsbanki Íslands can include the ISAT2008 code (from the xml here): 64.19 and the corresponding ISIC4 code: 6419. And from there display information on the purpose of the company (in this instance “Other monetary intermediation”)
Now on to the great part.
Scraperwiki provides great API access to each scrapers datastore. That means that we can query the data directly. The ISAT2008 data can for instance be accessed here in this way (getting the data on the previous ISAT2008 code, 64.19):
https://api.scraperwiki.com/api/1.0/datastore/sqlite?format=jsondict&name=isat2008&query=select%20*%20from%20%60ISAT2008%60%20where%20isat2008%3D%2264.19%22
This returns a JSON object with the data, we can also get it as a HTML table, RSS feed or CSV.
Now I’m guessing that most people out there would be interested in the NACE rev2 / ISIC4 mappings. So here is the API for that.
This in itself is fine, but relies on a little bit of knowledge on how to construct SQL queries. So I took advantage of another awesome Scraperwiki feature, which is that from each scraper/datastore you can create a “View”. A view is simply code to display data from the datastore and it can be constructed in HTML, Python, PHP or Ruby.
Here is a view that when fed a NACE rev2 code or a ISIC4 code, returns a JSON object with the corresponding mapping. If you want to get the ISIC4 code for the NACE rev2 code for Landsbanki Íslands, do as this:
https://views.scraperwiki.com/run/nace2_to_isic4/?nace2code=64.19
(views.scraperwiki.com/run/nace2_to_isic4/?nace2code=NACE2_CODE_HERE)
Similarly the NACE2code from a ISIC4 code:
https://views.scraperwiki.com/run/nace2_to_isic4/?isic4code=6419
(views.scraperwiki.com/run/nace2_to_isic4/?isic4code=ISIC4_CODE_HERE)
This can be used to bulk match codes between these two systems, for example in Google Refine (example here).
Additionally I then made another view that can quickly show you the mappings, which is useful if you only want to convert few numbers. That’s available here. Try it, I think it’s cool. The code is stolen/borrowed from Ross Jones.
So that’s it. The world has it a bit easier now converting between these system. Hopefully someone can use this.