Category Archives: nice to know

Generate table in Oracle

I regularly happens to me that I want to generate a random set of records in an Oracle table. That could happen if we want to assess performance of a certain procedure. Or (other example) if we want to estimate the size of a table. A great thing of Oracle is the random function where… Read More »

Sqoop and Hive

It is possible to use Sqoop to directly load from a RDBMS into Hive. This opens interesting possibilities. Data that are stored in a RDBMS and that need to be analysed on a cheaper platform, can be migrated via Sqoop to a Hadoop platform. Sqoop is generaly seen as a reliable medium to undertake such… Read More »

Hive – mapreduce extension

It is good to realise that Hive is built upon a mapreduce framework. The idea is that Hive is developed by facebook to facilitate analysis on Hadoop files. It is possible to use some kind of a SQL dialect in stead of a Python or a java programme to do your analysis. When a Hive… Read More »

Python: yet another way to implement map/ reduce

In this blog, I will discuss the word count problem as done with Python. It is often used to show how map reduce works. In most examples, it is developed within the context of a Java programme. The idea is that the programme is split into two stages. In one stage, calculations are made on… Read More »

ODBC en Hive

  In my view, the new development that we see now is building links to a Hadoop platform. One such development is building ODBC drivers that allow windows tools to access a Hadoop platform. An an example, one may think of Excel accessing tables on Hive. Think for a second on the possibilities: one may… Read More »

Pig revisited

Recently, I revisited Pig. Pig is a language that allows you to analyse data sets in a Hadoop environment. It is created at Yahoo to circumvent the technicalities of creating a MapReduce Java job. Yahoo claims that most of her queries on a Hadoop platform can be replaced by a Pig script. As Pig is… Read More »

Oops how much tablespace is left?

A few days ago, I was asked to load some tables in Oracle. A rather trivial question but I wasn’t sure if enough tablespace was left. From the table definition, I came to know what tablespace was used. After that I ran below query to see how much tablespace was actually left. I want to… Read More »

pushing files via Netcat

Netcat is a utility in unix to investigate network connections. It has now been ported to windows and it allows us to query network connections on a windows platform with netcat (nc). A nice possibility is to push files via nc from one machine to another. Assume for the moment that both machines have netcat… Read More »

Flume

Flume allows to directly tranfer messages into a file. It even allows such files to be stored on Hadoop. This opens a way to capture messages in a file that is stored on Hadoop, ready to be analysed. The example is a series of events from a log that are collected. The file is then… Read More »

Serialise

I encountered the term “serialise”. But what does it mean? I understood the term “serialise” when I read a comment that explained that data structures can be created inside, say, PHP. One may think of an object or an array. Such data structures can only be used inside PHP and they cannot be transported outside… Read More »