Monthly Archives: February 2011

Run ODBC

In some environments, the possibility to create an ODBC link is prohibited. Due to misunderstood security issues, the creation of an ODBC link is not permitted. But is might be that we have a way out!

Try odbcad32.

And there is a remote chance that you enter the screen to create an ODBC link.

Create a surrogate key in Oracle

To create a surrogate key in Oracle is not really trivial. The issue, we address works as follows. Assume we have a table with 3 records an four attributes. The first attribute is a name, the second is a surrogate key. Other attributes refer to a user and a datetime stamp. This surrogate key should contain a unique number that allows us to locate the record uniquely. Let us assume the table looks like:

name, SURKEY, CreatedDate, CreatedUser
tom    1
ine     2
paula 

In this case, we would like to have a unique value next to “paula”.
This can be accomplished by a so-called trigger that creates such a unique value every time a record is inserted. To ensure that a unique value is retrieved, we use a sequence. The trigger has next code:

create or replace
TRIGGER SCOTT.DWH_SAI_ID
BEFORE INSERT
ON SCOTT.TABLE_SURKEY 
REFERENCING NEW AS New OLD AS Old
FOR EACH ROW
DECLARE
tmpVar NUMBER;
BEGIN
   tmpVar := 0;
   SELECT DWH_TECH_ID.NEXTVAL INTO tmpVar FROM dual;
   :NEW.SURKEY := tmpVar;
   :NEW.CreatedDate := SYSDATE;
   :NEW.CreatedUser := USER;
END DWH_SAI_ID; 

create or replace trigger scott.PK_PK 
	   before insert on "SCOTT"."SURKEY" 
	   for each row 
	begin  
	   if inserting then
	      if :NEW."ID" is null then 
	         select scott.PKSEQ.nextval into :NEW."SURKEY" from dual; 
	      end if; 
	   end if;
	end;

This trigger takes a new (unique) value from a sequence DWH_TECH_ID and stores this value in attribute SURKEY. Along with the surrogate key, the user that inserted the record and the datetime is stored in the record.

The result is:

name, SURKEY, CreatedDate, CreatedUser
tom    1
ine     2
paula  3           2010/2/2      tom

send mail via telnet

I am subscribed to a ADSL provider who also provides Email services. Hence, I could try to send some mail from the command line on my laptop.

As a first step, I started a telnet session with the provider: from the command line I gave the command: “telnet mail.wxs.nl 25”. The server responded promptly. As response I received: “220 CPSMTPM-CMT103.kpnxchange.com kpnxchange.com Fri, 24 Dec 2010 00:03:22 +0100”.

My next command was “helo mail.wxs.nl” to contact the mail server. The reply from the mail server was: “250 CPSMTPM-CMT101.kpnxchange.com Hello [77.164.145.154]”.

As sender, I introduced: “mail from: tom.vanmaanen@wxs.nl”. Apparently I was accepted as a legal sender as the server replied: “250 2.1.0 tom.vanmaanen@wxs.nl….Sender OK”.

I decided to send a mail to my-self: “rcpt to: tom.vanmaanen@wxs.nl”.

To start the email, I had to issue the command “data”, upon which the server asked me to start the mail content: “354 Please start mail input.”.

The mail content started with a subject line: “Subject: test mail”. After this the mail body could be issued: “this is a test”.

The mail content was finished by a dot “.” on a line.

After this, the mail was queued for delivery. This was indicated by a response from the mail server: “250 Mail queued for delivery.”. After that I gave “quit” upon which the actual mail was sent.

What is the benefit of this exercise? At least, it shows me a way to send Emails from the command line. This could also be used to write a program that could send Emails automatically. Such a program should be able to send output to the command line. If similar lines like above would be send, an Email would be sent. This could then be used as a warning mechanism.

Create my own data warehouse

I had the time to create a small example data warehouse.

As sourcing system, I used the Northwind database. This is an example database that is provided by Microsoft. It is a really nice database: about 13 tables and tables loaded with 3- 2200 records.

I created a set of extraction scripts on this. This creates a set of 13 files, each one a download of one of the tables. These files are subsequently loaded into a staging database – again an upload of the previously created extraction file. This staging database is subsequently loaded into a datavault data warehouse. For a ddl see here.

In total, we have next set of scripts

  • 13 scripts to dump the 13 tables of the Northwind database
  • 13 scripts to load the files into a staging database
  • 10 scripts to load the hubs in the datavault database
  • 7 scripts to load the links in the datavault database
  • 9 scripts to load the satellites in the datavault database
These scripts can be started from a dashboard that is stored online. See here .
It was simply fun to create this. For the techies: everything is written in php and the database are stored in a MySQL database.

Google Refine

I am subscribed to several blogs. One such blog is a blog from Dylan Jones that is maintained from the “Data Quality Pro community”. From this blog I got the suggestion to investigate Google Refine. And it was surely a worthwhile investment.

Google Refine is a tool that is acquired by Google to improve data quality. When used for this purpose, it is immensely powerful.

Google Refine can be downloaded here . It is a download of approximately 30 MB. It can be installed on a Windows, Linux and an Apple platform. Installation is straightforward, allthough a Java Runtime Engine is required. Also: installation is free, no costs involved nor a registration. After launching, the application uses the web browser as interface.

The application is able to read/ write text formatted files. If the data are stored in an Oracle database, one should extract such tables to external text files. After extraction one could load such data in Google Refine. Loading data is easily done: in the interface we may notice the button “open” that reads a file. Subsequently “export” allows to write the data to an external file. I really appreciate such a well thought interface: trivial functions such as reading and writing should be made easily accessible.

The data can be analysed when the mouse is positioned at the header of a column. A small drop down list displays itself with “facet”, subsequently “text facet”. Selecting this creates a frequency overview on this column. This allows a quick overview on the domain of the column values. This is functionality that is also known as creating a pivot table. From this moment on, we get access to functionality that is really awesome. This frequency overview can be analysed with a cluster analysis that clearly displays values that closely resembles each other. We know that in many tables different spellings can be used to express the same thing: “Transport Accident” can also be written as “Transport accident”. However a frequency diagram then contains two lines: one for “Transport Accident” and another for “Transport accident”. Google Refine allows to detect such small differences in the cluster analysis automatically. The deviations can also be synchronised automatically. We can then avoid the tedious manual adjustment of values.

Moving to Joomla

Previous pages can be found here
I had a site previously. This was basically a standard HTML based site to which comments were added over time. Above link brings you back to these pages. But only a few days ago, I discovered Joomla. This Joomla is a template that allows you to add comments and notes. Henceforth, I decided to use this Joomla template. Let us how it works out.

Hum hum

I got hacked. Snif. Sorry for Joomla, but I returned to WordPress. Hope it is safer!

Northwind database

Microsoft proposed the “Northwind Database” as an example data database. It is a very interesting database as it provides us with a database that allows us to exercise with new techniques. This was also the purpose of the Northwind database: to show implementations of Microsoft applications. It can also be used to test other applications, like Cognos, MicroStrategy etc. I have rewritten the script to have it inserted in a MySQL database.

See: download.

NAS Server

Recently I bought a NAS server. Just to store some files, photos and one or two movies that came across. To be honest, I did not realise how much software was added to this NAS. When I opened the admin page, I was almost shocked with the tons of software that came across: MySQL, PHP, FTP, a telnet server, a HTTP server, several download apps, a media player. To see how it worked, I created a small website that gave access to a few application that are installed. See here. Curious how long it takes before this is discovered by those idiots that sell Viagra and pr0n. I will keep you informed.