Monthly Archives: August 2014

Yet another checklist

I often come across checklists. Such checklists are used to assess the quality of, say, a database. See here for an example of such a checklist. In this case, we have a checklist that is created to assess the quality of a Teradata database.
Such checklist are very interesting. They are generally based on a very good understanding of the subject area. Moreover, such knowledge is written in a very concise form which provides the reader with a quick insight as to whether the list might benefit the reader.

Teradata: what columns to choose as Primary Index?

A major question with Teradata tables is what columns to choose when a primary index must be created.
I understand that 4 different arguments might be used:
1: Is the column often accessed? If it is often accessed, best usage is made of the distibution of records over the different amps.
2: Is the column used in joins: in that case a similar argument holds. If the column is used in a join condition, usage is made of the distribution of records over the different amps.

 
TERADATA_FALLBACK

 
3: Is the column often updated. If a culumn is updated, the record might be re-allocated to another amp. Hence, the less updates, the better it is to have it included in a primary index.
4: Is the cardinality high. In that case, it is possible to distribute the records over many amps, enabling Teradata to make optimal usage of the primary index capability.

Teradata and fall back

I understand Teradata has a so-called fall back option. The idea is that data are stored twice; each record being stored to two different amps.

I saw a nice picture that descibes the situation. Each record (Row 1, Row 2 etc) are linked to an amp. It is shown in below scheme as yellow boxes. A seperate copy is made of all records. The copy is shown in an orange box. The records are all linked to another amp as the original record.

TERADATA_FALLBACK

Having a fall back, certainly has advantages. If an amp fails, a query may use copies of the original records.

However, there is a flipside to this. It requires:

  • more table space as copies must be stored
  • more IO time as the copies must be written to disc
  • more CPU time as he copies must be created by the system

Hence, this option is only used for critical tables where the additional costs are justified. Moreover, this option is advised to be used at table level only. This allows to make a fine distinction between tables for whih such additional costs are warrented and other tables where additional costs are not justified.