:: DEVELOPER ZONE
The [NDBD] section is used to configure the
behavior of the cluster's data nodes. There are many parameters
which control buffer sizes, pool sizes, timeouts, and so forth. The
only mandatory parameters are:
Either ExecuteOnComputer or
HostName.
The parameter NoOfReplicas
These need to be defined in the [NDBD DEFAULT]
section.
Most data node parameters are set in the [NDBD
DEFAULT] section. Only those parameters explicitly stated
as being able to set local values are allowed to be changed in the
[NDBD] section. HostName,
Id and ExecuteOnComputer
must be defined in the local
[NDBD] section.
Identifying Data Nodes
The Id value (that is, the the data node
identifier) can be allocated on the command line when the node is
started or in the configuration file.
For each parameter it is possible to use k,
M, or G as a suffix to
indicate units of 1024, 1024*1024, or 1024*1024*1024. (For example,
100k means 100 * 1024 = 102400.) Parameters and
values are currently case-sensitive.
[NBDB]Id
This is the node ID used as the address of the node for all cluster internal messages. This is an integer between 1 and 63. Each node in the cluster has a unique identity.
[NDBD]ExecuteOnComputer
This refers to one of the computers (hosts) defined in the
COMPUTER section.
[NDBD]HostName
Specifying this parameter has an effect similar to specifying
ExecuteOnComputer. It defines the hostname of
the computer the storage node on which is to reside. Either this
parameter or ExecuteOnComputer is required.
[NDBD]ServerPort
Each node in the cluster will use one port as the port other nodes use to connect the transporters to each other. This port is used also for non-TCP transporters in the connection setup phase. The default port will be calculated to ensure that no nodes on the same computer receive the same port number.
[NDBD]NoOfReplicas
This global parameter can be set only in the [NDBD
DEFAULT] section, and defines the number of replicas for
each table stored in the cluster. This parameter also specifies
the size of node groups. A node group is a set of nodes that all
store the same information.
Node groups are formed implicitly. The first node group is formed
by the set of data nodes with the lowest node IDs, the next node
group by the set of the next lowest node identities, and so on. By
way of example, assume that we have 4 data nodes and that
NoOfReplicas is set to 2. The four data nodes
have node IDs 2, 3, 4 and 5. Then the first node group is formed
from nodes 2 and 3, and the second node group by nodes 4 and 5. It
is important to configure the cluster in such a manner that nodes
in the same node groups are not placed on the same computer, as in
this situation a single hardware failure would cause the entire
cluster to crash.
If no node IDs are provided then the order of the data nodes will
be the determining factor for the node group. Whether or not
explicit assignments are made, they can be viewed in the output of
the management client's SHOW command.
There is no default value for NoOfReplicas; the
maximum possible value is 4.
[NDBD]DataDir
This parameter specifies the directory where trace files, log files, pid files and error logs are placed.
[NDBD]FileSystemPath
This parameter specifies the directory where all files created for
metadata, REDO logs, UNDO logs and data files are placed. The
default is the directory specified by DataDir.
Note: This directory must exist
before the ndbd process is initiated.
The recommended directory hierarchy for MySQL Cluster includes
/var/lib/mysql-cluster, under which a
directory for the node's filesystem is created. This subdirectory
contains the node ID. For example, if the node ID is 2, then this
subdirectory is named ndb_2_fs.
[NDBD]BackupDataDir
It is also possible to specify the directory in which backups are
placed. By default, this directory is
.
(See above.)
FileSystemPath/BACKUP
Data Memory and Index Memory
DataMemory and IndexMemory
are parameters specifying the size of memory segments used to store
the actual records and their indexes. In setting values for these,
it is important to understand how DataMemory and
IndexMemory are used, as they usually need to be
updated in order to reflect actual usage by the cluster:
[NDBD]DataMemory
This parameter defines the amount of space available to store database records. The entire amount will be allocated in memory, so it is extremely important that the machine has sufficient physical memory to accomodate this value.
The memory allocated by DataMemory is used to
store both the the actual records and indexes. Each record is
currently of fixed size. (Even VARCHAR columns
are stored as fixed-width columns.) There is a 16-byte overhead on
each record; an additional amount for each record is incurred
because it is stored in a 32KB page with 128 byte page overhead
(see below). There is also a small amount wasted per page due to
the fact that each record is stored in only one page. The maximum
record size is currently 8052 bytes.
The memory space defined by DataMemory is also
used to store ordered indexes, which use about 10 bytes per
record. Each table row is represented in the ordered index. A
common error among users is to assume that all indexes are stored
in the memory allocated by IndexMemory, but
this is not the case: only primary key and unique hash indexes use
this memory; ordered indexes use the memory allocated by
DataMemory. However, creating a primary key or
unique hash index also creates an ordered index on the same keys,
unless you specify USING HASH in the index
creation statement. This can be verified by running
ndb_desc -d db_name
table_name in the management
client.
The memory space allocated by DataMemory
consists of 32KB pages, which are allocated to table fragments.
Each table is normally partitioned into the same number of
fragments as there are data nodes in the cluster. Thus, for each
node, there are the same number of fragments as are set in
NoOfReplicas. Once a page has been allocated,
it is currently not possible to return it to the pool of free
pages, except by deleting the table. Performing a node recovery
also will compress the partition because all records are inserted
into empty partitions from other live nodes.
The DataMemory memory space also contains UNDO
information: For each update, a copy of the unaltered record is
allocated in the DataMemory. There is also a
reference to each copy in the ordered table indexes. Unique hash
indexes are updated only when the unique index columns are
updated, in which case a new entry in the index table is inserted
and the old entry is deleted upon commit. For this reason, it is
also necessary to allocate enough memory to handle the largest
transactions performed by applications using the cluster. In any
case, performing a few large transactions holds no advantage over
using many smaller ones, for the following reasons:
Large transactions are not any faster than smaller ones
Large transactions increase the number of operations that are lost and must be repeated in event of transaction failure
Large transactions use more memory
The default value for DataMemory is 80MB; the
minimum is 1MB. There is no maximum size, but in reality the
maximum size has to be adapted so that the process does not start
swapping when the limit is reached. This limit is determined by
the amount of physical RAM available on the machine and by the
amount of memory that the operating system may commit to any one
process. 32-bit operating systems are generally limited to 2-4GB
per process; 64-bit operating systems can use more. For large
databases, it may be preferable to use a 64-bit operating system
for this reason. In addition, it is also possible to run more than
one ndbd process per machine, and this may
prove advantageous on machines with multiple CPUs.
[NDBD]IndexMemory
This parameter controls the amount of storage used for hash indexes in MySQL Cluster. Hash indexes are always used for primary key indexes, unique indexes, and unique constraints. Note that when defining a primary key and a unique index, two indexes will be created, one of which is a hash index used for all tuple accesses as well as lock handling. It is also used to enforce unique constraints.
The size of the hash index is 25 bytes per record, plus the size of the primary key. For primary keys larger than 32 bytes another 8 bytes is added.
Consider a table defined by
CREATE TABLE example ( a INT NOT NULL, b INT NOT NULL, c INT NOT NULL, PRIMARY KEY(a), UNIQUE(b) ) ENGINE=NDBCLUSTER;
There are 12 bytes overhead (having no nullable columns saves 4 bytes of overhead) plus 12 bytes of data per record. In addition we have two ordered indexes on columns a and b consuming about 10 bytes each per record. There is a primary key hash index on the base table with roughly 29 bytes per record. The unique constraint is implemented by a separate table with b as primary key and a as a column. This table will consume another 29 bytes of index memory per record in the example table as well as 12 bytes of overhead, plus 8 bytes of record data.
Thus, for one million records, we need 58 MB for index memory to handle the hash indexes for the primary key and the unique constraint. We also need 64 MB for the records of the base table and the unique index table, plus the two ordered index tables.
You can see that hash indexes takes up a fair amount of memory space; however, they provide very fast access to the data in return. They are also used in MySQL Cluster to handle uniqueness constraints.
Currently the only partitioning algorithm is hashing and ordered indexes are local to each node. Thus ordered indexes cannot be used to handle uniqueness constraints in the general case.
An important point for both IndexMemory and
DataMemory is that the total database size is
the sum of all data memory and all index memory for each node
group. Each node group is used to store replicated information, so
if there are four nodes with 2 replicas, then there will be two
node groups. Thus, the total data memory available is
2*DataMemory for each data node.
It is highly recommended that DataMemory and
IndexMemory be set to the same values for all
nodes. Since data is distributed evenly over all nodes in the
cluster the maximum amount of space available is no better than
that of the smallest node in the cluster.
DataMemory and IndexMemory
can be changed, but decreasing either of these can be risky; doing
so can easily lead to a node or even an entire Cluster that is
unable to restart due to there being insufficient memory space.
Increasing these values should be quite okay, but it is
recommended that such upgrades are performed in the same manner as
a software upgrade, beginning with an update of the configuration
file, then restarting the management server followed by restarting
each data node in turn.
Updates do not increase the amount of index memory used. Inserts take effect immediately; however, rows are not actually deleted until the transaction is committed.
The default value for IndexMemory is 18MB. The
minimum is 1MB.
Transaction Parameters
The next three parameters which we discuss are important because
they affect the number of parallel transactions and the sizes of
transactions that can be handled by the system.
MaxNoOfConcurrentTransactions sets the number of
parallel transactions possible in a node;
MaxNoOfConcurrentOperations sets the number of
records that can be in update phase or locked simultaneously.
Both of these parameters (especially
MaxNoOfConcurrentOperations) are likely targets
for users setting specific values and not using the default value.
The default value is set for systems using small transactions, in
order to ensure that these do not use excessive memory.
[NDBD]MaxNoOfConcurrentTransactions
For each active transaction in the cluster there must be a record in one of the cluster nodes. The task of co-ordinating transactions is spread amongst the nodes: the total number of transaction records in the cluster is the number of transactions in any given node times the number of nodes in the cluster.
Transaction records are allocated to individual MySQL servers. Normally there is at least one transaction record allocated per connection that using any table in the cluster. For this reason, one should ensure that there are more transaction records in the cluster than there are concurrent connections to all MySQL servers in the cluster.
This parameter must be set to the same value for all cluster nodes.
Changing this parameter is never safe and can cause a cluster to crash. When a node crashes one of the node (actually the oldest surviving node) will build up the transaction state of all transactions ongoing in the crashed node at the time of the crash. It is thus important that this node has as many transaction records as the failed node.
The default value for this parameter is 4096.
[NDBD]MaxNoOfConcurrentOperations
It is a good idea to adjust the value of this parameter according to the size and number of transactions. When performing transactions of only a few operations each and not involving a great many records, there is no need to set this parameter very high. When performing large transactions involving many records need to set this parameter higher.
Records are kept for each transaction updating cluster data, both in the transaction co-ordinator and in the nodes where the actual updates are performed. These records contain state information needed in order to find UNDO records for rollback, lock queues, and other purposes.
This parameter should be set to the number of records to be updated simultaneously in transactions, divided by the number of cluster data nodes. For example, in a cluster which has 4 data nodes and which is expected to handle 1,000,000 concurrent updates using transactions, you should set this value to 1000000 / 4 = 250000.
Read queries which set locks also cause operation records to be created. Some extra space is allocated in the individual nodes to accomodate cases where the distribution is not perfect over the nodes.
When queries make use of the unique hash index, there are actually two operation records used per record in the transaction. The first record represents the read in the index table and the second handles the operation on the base table.
The default value for this parameter is 32768.
This parameter actually handles two values that can be configured separately. The first of these specifies how many operation records are to be placed with the transaction co-ordinator. The second part specifies how many operation records are to be local to the database.
A very large transaction performed on a 8-node cluster requires as
many operation records in the transaction co-ordinator as there
are reads, updates, and deletes involved in the transaction.
However, the operation records of the are spread over all 8 nodes.
Thus, if it is necessary to configure the system for one very
large transaction, then it is a good idea to configure the two
parts separately. MaxNoOfConcurrentOperations
will always be used to calculate the number of operation records
in the transaction co-ordinator portion of the node.
It is also important to have an idea of the memory requirements for operation records. In MySQL 4.1, these consume about 1KB per record. This figure is expected to be reduced in MySQL 5.x.
[NDBD]MaxNoOfLocalOperations
By default, this parameter is calculated as 1.1 *
MaxNoOfConcurrentOperations which fits systems
with many simultaneous transactions, none of them being very
large. If there is a need to handle one very large transaction at
a time and there are many nodes, then it is a good idea to
override the default value by explicitly specifying this
parameter.
Transaction Temporary Storage
The next set of parameters is used to determine temporary storage when executing a query which is part of a Cluster transaction. All records are released when the query is completed and the cluster is waiting for the commit or rollback.
The default values for these parameters are adequate for most situations. However, users with a need to support transactions involving large numbers of rows or operations may need to increase these to enable better parallelism in the system, while users whose applications require relatively small transactions can decrease the values in order to save memory.
[NDBD]MaxNoOfConcurrentIndexOperations
For queries using a unique hash index another, temporary set of
operation records is used during a query's execution phase. This
parameter sets the size of that pool of records. Thus this record
is only allocated while executing a part of a query, as soon as
this part has been executed the record is released. The state
needed to handle aborts and commits is handled by the normal
operation records where the pool size is set by the parameter
MaxNoOfConcurrentOperations.
The default value of this parameter is 8192. Only in rare cases of extremely high parallelism using unique hash indexes should it be necessary to increase this value. Using a smaller value is possible and can save memory if the DBA is certain that a high degree parallelism is not required for the cluster.
[NDBD]MaxNoOfFiredTriggers
The default value of MaxNoOfFiredTriggers is
4000, which is sufficient for most situations. In some cases it
can even be decreased if the DBA feels certain the need for
parallelism in the cluster is not high.
A record is created when an operation is performed that affects a unique hash index. Inserting or deleting a record in a table with unique hash indexes or updating a column that is part of a unique hash index fires an insert or a delete in the index table. The resulting record is used to represent this index table operation while waiting for the original operation that fired it to complete. This operation is short lived but can still require a large number of records in its pool for situations with many parallel write operations on a base table containing a set of unique hash indexes.
[NDBD]TransactionBufferMemory
The memory affected by this parameter is used for tracking operations fired when updating index tables and reading unique indexes. This memory is used to store the key and column information for these operations. It is only very rarely that the value for this parameter needs to be altered from the default.
Normal read and write operations use a similar buffer, whose usage
is even more short-lived. The compile-time parameter
ZATTRBUF_FILESIZE (found in
ndb/src/kernel/blocks/Dbtc/Dbtc.hpp) set to
4000*128 bytes (500KB). A similar buffer for key info,
ZDATABUF_FILESIZE (also in
Dbtc.hpp) contains 4000*16 = 62.5KB of buffer
space. Dbtc is the module which handles
transaction co-ordination.
Scans and Buffering
There are additional parameters in the Dblqh
module (in
ndb/src/kernel/blocks/Dblqh/Dblqh.hpp) which
affect reads and updates. These include
ZATTRINBUF_FILESIZE, set by default to
10000*128 bytes (1250KB) and
ZDATABUF_FILE_SIZE, set by default to 10000*16
bytes (roughly 156KB) of buffer space. To date, there have been
neither any reports from users nor any results from our own
extensive tests suggesting that either of these compile-time
limits should be increased.
The default value for TransactionBufferMemory
is 1MB.
[NDBD]MaxNoOfConcurrentScans
This parameter is used to control the number of parallel scans
that can be performed in the cluster. Each transaction
co-ordinator can handle the number of parallel scans defined for
this parameter. Each scan query is performed by scanning all
partitions in parallel. Each partition scan uses a scan record in
the node where the partition is located, the number of records
being the value of this parameter times the number of nodes. The
cluster should be able to sustain
MaxNoOfConcurrentScans scans concurrently from
all nodes in the cluster.
Scans are actually performed in two cases. The first of these cases occurs when no hash or ordered indexes exists to handle the query, in which case the query is executed by performing a full table scan. The second case is encountered when there is no hash index to support the query but there is an ordered index. Using the ordered index means executing a parallel range scan. Since the order is kept on the local partitions only, it is necessary to perform the index scan on all partitions.
The default value of MaxNoOfConcurrentScans is
256. The maximum value is 500.
This parameter specifies the number of scans possible in the
transaction co-ordinator. If the number of local scan records is
not provided, it is calculated as the product of
MaxNoOfConcurrentScans and the number of data
nodes in the system.
[NDBD]MaxNoOfLocalScans
Specifies the number of local scan records if many scans are not fully parallelized.
[NDBD]BatchSizePerLocalScan
This parameter is used to calculate the number of lock records which needs to be there to handle many concurrent scan operations.
The default value is 64; this value has a strong connection to the
ScanBatchSize defined in the SQL nodes.
[NDBD]LongMessageBuffer
This is an internal buffer used for passing messages within individual nodes and between nodes. Whilke it is highly unlikely that this would need to be changed, it is configurable. By default it is set to 1MB.
Logging and Checkpointing
[NDBD]NoOfFragmentLogFiles
This parameter sets the size of the node's REDO log files. REDO log files are organized in a ring. It is extremely important that the first and last log files (sometimes referred to as the "head" and "tail" log files, respectively) do not meet; when these approach one another too closely, the node will begin aborting all transactions encompassing updates due to a lack of room for new log records.
A REDO log record is not removed until three local checkpoints have been completed since that log record was inserted. Checkpointing frequency is determined by its own set of configuration parameters discussed elsewhere in this chapter.
The default parameter value is 8, which means 8 sets of 4 16MB
files for a total of 512MB. In other words, REDO log space must be
allocated in blocks of 64MB. In scenarios requiring a great many
updates, the value for NoOfFragmentLogFiles may
need to be set as high as 300 or even higher in order to provide
sufficient space for REDO logs.
If the checkpointing is slow and there are so many writes to the
database that the log files are full and the log tail cannot be
cut without jeapo rdising recovery, then all updating transactions
will be aborted with internal error code 410, or Out of
log file space temporarily. This condition will prevail
until a checkpoint has completed and the log tail can be moved
forward.
[NDBD]MaxNoOfSavedMessages
This parameter sets the maximum number of trace files that will be kept before overwriting old ones. Trace files are generated when, for whatever reason, the node crashes.
The default is 25 trace files.
Metadata Objects
The next set of parameters defines pool sizes for metadata objects. It is necessary to define the maximum number of attributes, tables, indexes, and trigger objects used by indexes, events, and replication between clusters.
[NDBD]MaxNoOfAttributes
Defines the number of attributes that can be defined in the cluster.
The default value for this parameter is 1000, with the minimum possible value being 32. There is no maximum. Each attribute consumes around 200 bytes of storage per node due to the fact that all metadata is fully replicated on the servers.
[NDBD]MaxNoOfTables
A table object is allocated for each table, unique hash index, and ordered index. This parameter sets the maximum number of table objects for the cluster as a whole.
For each attribute that has a BLOB data type an
extra table is used to store most of the BLOB
data. These tables also must be taken into account when defining
the total number of tables.
The default value of this parameter is 128. The minimum is 8 and the maximum is 1600. Each table object consumes approximately 20KB per node.
[NDBD]MaxNoOfOrderedIndexes
For each ordered index in the cluster, an object is allocated describing what is being indexed and its storage segments. By default each index so defined also defines an ordered index. Each unique index and primary key has both an ordered index and a hash index.
The default value of this parameter is 128. Each object consumes approximately 10KB of data per node.
[NDBD]MaxNoOfUniqueHashIndexes
For each unique index that is not a primary key, a special table
is allocated that maps the unique key to the primary key of the
indexed table. By default there an ordered index is also defined
for each unique index. To prevent this, you must use the
USING HASH option when defining the unique
index.
The default value is 64. Each index consumes approximately 15KB per node.
[NDBD]MaxNoOfTriggers
Internal update, insert, and delete triggers are allocated for each unique hash index. (This means that 3 triggers are created for each unique hash index.) However, an ordered index requires only a single trigger object. Backups also use three trigger objects for each normal table in the cluster.
Note: When replication between clusters is supported, this will also make use of internal triggers.
This parameter sets the maximum number of trigger objects in the cluster.
The default value for this parameter is 768.
[NDBD]MaxNoOfIndexes
This parameter was deprecated in MySQL 4.1.5. In MySQL 4.1.6 and
newer versions, you should use
MaxNoOfOrderedIndexes and
MaxNoOfUniqueHashIndexes instead.
This parameter is used only by unique hash indexes. There needs to be one record in this pool for each unique hash index defined in the cluster.
The default value of this parameter is 128.
Boolean Parameters
The behavior of data nodes is also affected by a set of parameters taking on boolean values. These parameters can be specified as TRUE by setting them equal to 1 or Y, and as FALSE by setting them equal to 0 or N.
[NDBD]LockPagesInMainMemory
For a number of operating systems, including Solaris and Linux, it is possible to lock a process into memory and so avoid any swapping to disk. This can be used to help guarantee the cluster's real-time characteristics.
By default, this feature is disabled.
[NDBD]StopOnError
This parameter specifies whether an NDBD process is to exit or to perform an automatic restart when an error condition is encountered.
This feature is enabled by default.
[NDBD]Diskless
It is possible to specify Cluster tables as diskless, meaning that tables are not checkpointed to disk and that no logging occurs. Such tables exist only in main memory. A consequence of using diskless tables is that neither the tables nor the records in those tables will be preserved after a crash. However, when operating in diskless mode, it is possible to run ndbd on a diskless computer.
This feature makes the entire cluster operate in diskless mode. We plan to add the capability to employ diskless mode on a per-table basis in a future (5.0 or 5.1) release of MySQL Cluster.
Currently, when this feature is enabled, backups are performed but backup data is not actually stored. In a future release we hope to make diskless backup a separate, configurable parameter.
This feature is enabled by setting Diskless to
either 1 or
y. Diskless is
disabled by default.
[NDBD]RestartOnErrorInsert
This feature is accessible only when building the debug version where it is possible to insert errors in the execution of individual blocks of code as part of testing.
By default, this feature is disabled.
Controlling Timeouts, Intervals, and Disk Paging
There are a number of parameters specifying timeouts and intervals between various actions in Cluster data nodes. Most of the timeout values are specified in milliseconds. Any exceptions to this will be mentioned below.
[NDBD]TimeBetweenWatchDogCheck
To prevent the main thread from getting stuck in an endless loop at some point, a "watch dog" thread checks the main thread. This parameter specifies the number of milliseconds between checks. If the prrocess remains in the same state after three checks, it is terminated by the watch dog thread.
This parameter can easily be changed for purposes of experimentation or to ad apt to local conditions. It can be specified on a per-node nodes although there seems to be little reason for doing so.
The default timeout is 4000 milliseconds (4 seconds).
[NDBD]StartPartialTimeout
This parameter specifies the time that the cluster will wait for all storage nodes to come up before the cluster initialization routine is invoked. This timeout is used to avoid a partial Cluster startup whenever possible.
The default value is 30000 milliseconds (30 seconds). 0 means eternal time out; in other words, the cluster may start only if all nodes are available.
[NDBD]StartPartitionedTimeout
If the cluster is ready to start after waiting for
StartPartialTimeout milliseconds but is still
in a possibly partitioned state, the cluster waits until this
timeout has also passed.
The default timeout is 60000 milliseconds (60 seconds).
[NDBD]StartFailureTimeout
If a data node has not completed its startup sequence within the time specified by this parameter, then the node startup will fail. Setting this parameter to 0 means that no data node timeout is applied.
The default value is 60000 milliseconds (60 seconds). For data nodes containing extremely large amounts of data, this parameter should be increased. For example, in the case of a storage node containing several gigabytes of data, a period as long as 10-15 minutes (that is, 600,000 to 1,000,000 milliseconds) might be required in order to to perform a node restart.
[NDBD]HeartbeatIntervalDbDb
One of the primary methods of discovering failed nodes is by the use of heartbeats. This parameter states how often heartbeat signals are sent and how often to expect to receive them. After missing three heartbeat intervals in a row, the node is declared dead. Thus the maximum time for discovering a failure through the heartbeat mechanism is four times the heartbeat interval.
The default heartbeat interval is 1500 milliseconds (1.5 seconds). This parameter must not be changed drastically and should not vary widely between nodes. If one node uses 5000 milliseconds and the node watching it uses 1000 milliseconds then obviously the node will be declared dead very quickly. This parameter can be changed during an online software upgrade but only in small increments.
[NDBD]HeartbeatIntervalDbApi
Each data node sends heartbeat signals to each MySQL server (SQL
node) to ensure that it remains in contact. If a MySQL server
fails to send a heartbeat in time it is declared "dead", in which
case all ongoing transactions are completed and all resources
released. The SQL node cannot reconnect until the completion of
all activities started by the previous MySQL instance have been
completed. The three-heartbeat criteria for this determination are
the same as described for
HeartbeatIntervalDbDb.
The default interval is 1500 milliseconds (1.5 seconds). This interval can vary between indidivual data nodes because each storage node watches the MySQL servers connected to it, independently of all other data nodes.
[NDBD]TimeBetweenLocalCheckpoints
This parameter is an exception in that it doesn't specify a time to wait before starting a new local checkpoint; rather, it is used to ensure that local checkpoints are not performed in a cluster where relatively few updates are taking place. In most clusters with high update rates, it is likely that a new local checkpoint is started immediately after the previous one has been completed.
The size of all write operations executed since the start of the previous local checkpoints is added. This parameter is also exceptional in that it is specified as the base-2 logarithm of the number of 4-byte words, so that the default value 20 means 4MB (4 * (2 ^ 20)) of write operations, 21 would mean 8MB, and so on up to a maximum value of 31, which equates to 8GB of write operations.
All the write operations in the cluster are added together.
Setting TimeBetweenLocalCheckpoints to 6 or
less means that local checkpoints will be executed continuously
without pause, independent of the cluster's workload.
[NDBD]TimeBetweenGlobalCheckpoints
When a transaction is committed, it is committed in main memory in all nodes on which the data is mirrored. However, transaction log records are not flushed to disk as part of the commit. The reasoning behing this behavior is that having the transaction safely committed on at least two autonomous host machines should meet reasonable standards for durability.
It is also important to ensure that even the worst of cases -- a complete crash of the cluster -- is handled properly. To guarantee that this happens, all transactions taking place within a given interval are put into a global checkpoint, which can be thought of as a set of committed transactions that has been flushed to disk. In other words, as part of the commit process, a transaction is placed in a global checkpoint group; later, this group's log records are flushed to disk, and then the entire group of transactions is safely committed to disk on all computers in the cluster as well.
This parameter defines the interval between global checkpoints. The default is 2000 milliseconds.
[NDBD]TimeBetweenInactiveTransactionAbortCheck
Time-out handling is performed by checking a timer on each transaction once for every interval specified by this parameter. Thus, if this parameter is set to 1000 milliseconds, then every transaction will be checked for timing out once per second.
The default value for this parameter is 1000 milliseconds (1 second).
[NDBD]TransactionInactiveTimeout
If the transaction is currently not performing any queries but is waiting for further user input, this parameter states the maximum time that the user can wait before the transaction is aborted.
The default for this parameter is zero (no timeout). For a real-time database that needs to ensure that no transaction keeps locks for too long a time this parameter should be set to a much smaller value. The unit is milliseconds.
[NDBD]TransactionDeadlockDetectionTimeout
When a node executes a query involving a transaction, the node waits for the other nodes in the cluster to respond before continuing. A failure to respond can occur for any of the following reasons:
The node is "dead"
The operation has entered a lock queue
The node requested to perform the action could be heavily overloaded.
This timeout parameter states how long the transaction coordinator will wait for query execution by another node before aborting the transaction, and is important for both node failure handling and deadlock detection. Setting it too high can cause a undesirable behavior in situations involving deadlocks and node failure.
The default timeout value is 1200 milliseconds (1.2 seconds).
[NDBD]NoOfDiskPagesToDiskAfterRestartTUP
When executing a local checkpoint the algorithm flushes all data
pages to disk. Merely doing as quickly as possible without any
moderation is likely to impose excessive loads on processors,
networks, and disks. To control the write speed, this parameter
specifies how many pages per 100 milliseconds are to be written.
In this context, a "page" is defined as 8KB; thus, this parameter
is specified in units of 80KB per second. Therefore, setting
NoOfDiskPagesToDiskAfterRestartTUP to a value
of 20 means writing 1.6MB of data pages to disk each second during
a local checkpoint. This value includes the writing of UNDO log
records for data pages; that is, this parameter handles the
limitation of writes from data memory. UNDO log records for index
pages are handled by the parameter
NoOfDiskPagesToDiskAfterRestartACC. (See the
entry for IndexMemory for information about
index pages.)
In short, this parameter specifies how quickly local checkpoints
will be executed, and operates in conjunction with
NoOfFragmentLogFiles,
DataMemory, and IndexMemory.
The default value is 40 (3.2MB of data pages per second).
[NDBD]NoOfDiskPagesToDiskAfterRestartACC
This parameter uses the same units as
NoOfDiskPagesToDiskAfterRestartTUP and acts in
a similar fashion, but limits the speed of writing index pages
from index memory.
The default value of this parameter is 20 index memory pages per second (1.6MB per second).
[NDBD]NoOfDiskPagesToDiskDuringRestartTUP
This parameter in a similar fashion as
NoOfDiskPagesToDiskAfterRestartTUP and
NoOfDiskPagesToDiskAfterRestartACC, only it
does so with regard to local checkpoints executed in the node when
a node is restarting. As part of all node restarts a local
checkpoint is always performed. During a node restart it is
possible to write to disk at a higher speed than at other times,
because fewer activities are being performed in the node.
This parameter covers pages written from data memory.
The default value is 40 (3.2MB per second).
[NDBD]NoOfDiskPagesToDiskDuringRestartACC
Controls the number of index memory pages that can be written to disk during the loca checkpoint phase of a node restart.
As with NoOfDiskPagesToDiskAfterRestartTUP and
NoOfDiskPagesToDiskAfterRestartACC, values for
this parameter are expressed in terms of 8KB pages written per 100
milliseconds (80KB/second).
The default value is 20 (1.6MB per second).
[NDBD]ArbitrationTimeout
This parameter specifies the time that the data node will wait for a response from the arbitrator to an arbitration message. If this is exceeded, then it is assumed that the network has split.
The default value is 1000 milliseconds (1 second).
Buffering and Logging
Several configuration parameters corresponding to former compile-time parameters were introduced in MySQL 4.1.5. These enable the advanced user to have more control over the resources used by node processes and to adjust various buffer sizes at need.
These buffers are used as front-ends to the file system when writing log records to disk. If the node is running in diskless mode, then these parameters can be set to their minimum values without penalty due to the fact that disk writes are "faked" by the NDB storage engine's filesystem abstraction layer.
[NDBD]UndoIndexBuffer
This buffer is used during local checkpoints. The NDB storage engine uses a recovery scheme based on checkpoint consistency in conjunciton with an operational REDO log. In order to produce a consistent checkpoint without blocking the entire system for writes, UNDO logging is done while performing the local checkpoint. UNDO logging is activated on a single table fragment at a time. This optimization is possible because tables are stored entirely in main memory.
The UNDO index buffer buffer is used for the updates on the primary key hash index. Inserts and deletes rearrange the hash index; the NDB storage engine writes UNDO log records that map all physical changes to an index page so that they can be undone at system restart. It also logs all active insert operations for each fragment at the start of a local checkpoint.
Reads and updates set lock bits and update a header in the hash index entry. These changes are handled by the page-writing algorithm to insure that these operations need no UNDO logging.
This buffer is 2MB by default. The minimum value is 1MB, and for
most applications the minimum is sufficient. For applications
doing extremely large and/or numerous inserts and deletes together
with large transactions and large primary keys, it may be
necessary to increase the size of this buffer. If this buffer is
too small, the NDB storage engine issues internal error code 677
Index UNDO buffers overloaded.
[NDBD]UndoDataBuffer
The UNDO data buffer plays the same role as the UNDO index buffer, except that it is used with regard to data memory rather than index memory. This buffer is used during the local checkpoint phase of a fragment for inserts, deletes, and updates.
Since UNDO log entries tend to grow larger as more operations are logged, this buffer is also larger than its index memory counterpart, with a default value of 16MB.
For some applications this amount of memory may be unnecessarily large. In such cases it is possible to decrease this size down to a minimum of 1MB.
It is rarely necessary to increase the size of this buffer. If there is such a need, then it is a good idea to check if the disks can actually handle the load caused by database update activity. A lack of sufficient disk space cannot be overcome by increasing the size of this buffer.
If this buffer is too small and gets congested, the NDB storage
engine issues internal error code 891 Data UNDO buffers
overloaded.
[NDBD]RedoBuffer
All update activities also need to be logged. This log makes it possible to replay these updates whenever the system is restarted. The NDB recovery algorithm uses a "fuzzy" checkpoint of the data together with the UNDO log, then applies the REDO log to play back all changes up to the restoration point.
This buffer is 8MB by default. The minimum value is 1MB.
If this buffer is too small, the NDB storage engine issues error
code 1221 REDO log buffers overloaded.
In managing the cluster, it is very important to be able to control
the number of log messages sent for various event types to
stdout. There are 16 possible event levels
(numbered 0 through 15). Setting event reporting for a given event
category to level 15 means all event reports in that category are
sent to stdout; setting it to 0 means that there
will be no event reports made in that category.
By default, only the startup message is sent to
stdout, with the reaimaining event reporting
level defaults being set to 0. The reason for this is that these
messages are also sent to the management server's cluster log.
An analogous set of levels can be set for the management client to determine which event levels to record in the cluster log.
[NDBD]LogLevelStartup
Reporting level for events generated during startup of the process.
The default level is 1.
[NDBD]LogLevelShutdown
Reporting level for events generated as part of graceful shutdown of a node.
The default level is 0.
[NDBD]LogLevelStatistic
Reporting level for statistical events such as number of primary key reads, number of updates, number of inserts, information relating to buffer usage, and so on.
The default level is 0.
[NDBD]LogLevelCheckpoint
Reporting level for events generated by local and global checkpoints.
The default level is 0.
[NDBD]LogLevelNodeRestart
Reporting level for events generated during node restart.
The default level is 0.
[NDBD]LogLevelConnection
Reporting level for events generated by connections between cluster nodes.
The default level is 0.
[NDBD]LogLevelError
Reporting level for events generated by errors and warnings by the cluster as a whole. These errors do not cause any node failure but are still considered worth reporting.
The default level is 0.
[NDBD]LogLevelInfo
Reporting level for events generated for information about the general state of the cluster.
The default level is 0.
Backup Parameters
The parameters discussed in this section define memory buffers set aside for execution of online backups.
[NDBD]BackupDataBufferSize
In creating a backup, there are two buffers used for sending data
to the disk. The backup data buffer buffer is used to fill in data
recorded by scanning a node's tables. Once this buffer has been
filled to the level specified as
BackupWriteSize (see below), the pages are sent
to disk. While flushing data to disk, the backup process can
continue filling this buffer until it runs out of space. When this
happens, the backup process pauses the scan and waits until some
disk writes have completed and have thus freed up memory so that
scanning may continue.
The default value for this parameter is 2MB.
[NDBD]BackupLogBufferSize
The backup log buffer fulfils a role similar to that played by the backup data buffer, except that it used for generating a log of all table writes made during execution of the backup. The same principles apply for writing these pages as with the backup data buffer, except that when there is no more space in the backup log buffer, the backup fails. For that reason, the size of the back up logbuffer must be big enough to handle the load caused by write activities while the backup is being made.
The default value for this parameter should be sufficient for most applications. In fact, it is more likely for a backup failure to be caused by insufficient disk write speed than it is for the backup log buffer to become full. If the disk subsystem is not configured for the write load caused by applications, the cluster will be likely to be able to perform the desired operations.
It is preferable to configure cluster nodes in such a manner that the processor becomes the bottleneck rather than the disks or the network connections.
The default value is 2MB.
[NDBD]BackupMemory
This parameter is simply the sum of
BackupDataBufferSize and
BackupLogBufferSize.
The default value is 2MB + 2MB = 4MB.
[NDBD]BackupWriteSize
This parameter specifies the size of messages written to disk by the backup log and backup data buffers.
The default value is 32KB.
© 1995-2005 MySQL AB. All rights reserved.

User Comments
Warning: query failed: Unknown column 'user.firstname' in 'field list' in /data0/sites/live/web-main/lib/mysql-cxn.php on line 69
Warning: mysql_fetch_array(): supplied argument is not a valid MySQL result resource in /data0/sites/live/web-main/lib/docbook.php on line 245
Add your own comment.