Friday, July 11, 2014

MariaDB Replication, MaxScale and the need for a binlog server


This is an introduction to MariaDB Replication and to why we need a binlogs server and what this is. The first part is an introduction to replication basics, and if you know this already, then you want want to skip past the first section or two.

MariaDB Replication

MySQL and MariaDB has a simple but very effective replication system built into it. The replication system is asynchronous and is based on a pull, instead of a push, system. What this means in short is that the Master keeps track of the DML operations and other things that might change the state of the master database and this is stored in what is called the binlog. The slave on the other hand is responsible for getting the relevant information from the master to keep up to speed. The binlogs consist of a number of files that the master generates, and the traditional way of dealing with slaves is to point them to the master, specifying a starting point in the binlogs consisting of a filename and a position.

When a slave is started it gets the data from the binlogs, one record at the time, from the given position in the master binlogs and in the process updates the current binlogs file and position. So the master keep track of the transactions and the slave follows behind as fast as it can. The slave has two types of threads, the IO thread that gets data from the master and to a separate relay log on the slave, and an SQL thread that applies the data from the relay log to the slave database.
This really is less complicated than it sounds, in a way, but the implementation of it on the other hand is probably more complicated than one might think. There are also some issues with this setup, some which is fixed by the recent GTID implementation in MySQL 5.6 and more significantly in MariaDB 10.

In a simple setup with just 1 master and a few slaves, this is all there is to it: Take a backup from the master and take to keep track of the binlogs position when this is done, then recover this backup to a slave and then set the slave starting position at the position when the backup was executed. Now the slave will catch up with the operations that has happened since the backup was run and eventually it will catch up with the master and then poll for any new events.

One that that is not always the case with all database systems is that the master and the slave are only different in the sense of the configuration. Except this, these servers run the same software and the same operations can be applied to them, so running DML on a slave is no different than on a master, but if there is a collision between some data that was entered manually on the slave and some data the arrives through replication, say there is a duplicate primary key, then replication will stop.

What do we use replication for

Replication is typically used for one or more of three purposes:
•    Read Scale-out - In this case the slaves are used for serving data that is read, whereas all writes go to the master. As in most web applications there is a much larger amount of reads than writes, this makes for good scalability and we have fewer writes to be handled by the single master, whereas the many reads can be server by one or more slaves.
•    Backup - Using a slave for backup is usually a pretty good idea. This allows for cold backups even, as if we shut down the slave for backing it up, it will catch up with the master once restarted after the backup is done. Having a full database setup on a slave for backup also means that recovery times of we need to do that, is fast and also allows for partial recovery if necessary.
•    High Availability - As the master and Slave are kept in sync, at least with some delay, one can sure use the slave to fail over to should the master fail. The asynchronous nature of replication does mean though that failover is a bit more complex than one might think. There might be data in the relay slave log that has not yet been applied and might cause issues when the slave is treated as a master. Also, there might be data in the binlogs on the master which means that when the master is again brought on-line, it might be out of sync with the slave: some data that never was picked up by the slave might be on the master and data that entered the system after the slave was switched to a master is not on the old master.

The nature of replicated data

In the old days, the way replication worked was by just sending any statement that modified data on the master to the slave. This way of working is still available, but over time it was realized that this was a bit difficult, in particular with some storage engines.

This led to the introduction of Row-based Replication (RBR) where the data to be replicated is transferred not as a SQL statement but as a binary representation of the data to be modified.

Replicating the SQL statement is called Statement Based Replication (SBR). An example of a statement that can cause issues when using SBR is:
DELETE FROM test.tab1 WHERE id > 10 LIMIT 5
In this case RBR will work whereas when using statement based replication we cannot determine which rows will be deleted. There are more examples of such non-deterministic SQL statements where SBR fails but RBR works.

A third replication format mode is available, MIXED, where MariaDB decided ona statement by statement base which replication format is best.

Scaling replication

Eventually many users ended up having many slaves attached to that single master. And for a while, this was not a big issue, the asynchronous nature of things means that the load on the master was limited when using replication, but with enough nodes, eventually this turned in to being an issue.

The solution then was to introduce an "intermediate master". This is a slave that is also a master to other slaves, and this is configured having log_slave_updates on, which means that data that is applied on the slave from the relay log and into the slave, are also written to the binlog.

This is a pretty good idea, but there are some issues also. To begin with, on the intermediate master, data has to be written several times, once in the relay log, once in the database (and if InnoDB is used, a transaction log is also written) and then we have to write it to the binlog.

Another issue that is in effect here is the single threaded nature of replication (this is different in MariaDB 10 and MySQL 5.7 and up), which means that a slave on a master that runs many threads, might get into a situation where the slave can't keep up with the master, even though the slave is similarly configured as the master. Also, a run running statement on the master will hold up replication for as long as that statement runs on the slave, and if we have an intermediate master, then the delay will be doubled (once on the intermediate master and once on the actual slave).

The combined effect of the duplication of the delay and the requirement to write data so many times, leads to the result that an intermediate master maybe isn't such a good idea after all.
As for the replication use-cases, intermediate masters are sometimes used as alternative masters when failing over. This might seems like a good idea, but the issue is that the binlogs on the intermediate master doesn't look the same as the binlogs on the actual master. This is fixed by using Global Transaction IDs though, but these have different issues and unless you are running MariaDB 10 or MySQL 5.6, this isn't really an option (and even with MySQL 5.6, there are big issues with this).

What we need then is something else. Something that is a real intermediate master. Something that looks like a slave to the master and as a master to the slave, but doesn't have to write data three times first and that doesn't have to apply all the replication data itself so it doesn't introduce delays into the replication chain.

The slave that attaches to this server should see the same replication files as it would see it it connected to the real master.

MaxScale and the Plugin architecture

So let's introduce MaxScale then, and the plugin architecture. MaxScale has been described before, but one that that might not be fully clear is the role of the plugins. MaxScale relies much more on the plugins that most other architectures, fact is, without the plugins, MaxScale can't do anything, everything is a plugin!

The MaxScale core is a multi-threaded epoll based kernel with 5 different types of plugins (note that there might be more than one plugin of each type, and this is mostly the case actually:
•    Protocols - These implement communication protocols, including debugging and monitoring protocols. From this you realize that without appropriate protocol plugins, MaxScale will not be able to be accessed at all, so these modules are key. Among the current protocols are MariaDB / MySQL Client and Server protocols.
•    Authentication - This type of plugin authenticates users connecting to MaxScale. Currently MariaDB / MySQL Authentication is supported.
•    Router - This is a key type of module that determines how SQL traffic is routed an managed.
•    Filter - This is an optional type of pluging there the SQL traffic can be modfied, checked or rejected,
•    Monitor - This type of modules is there to monitor the servers that MaxScale connects to, and this data is used by the routing mode.

Before we end this discussion on MaxScale, note that there might be several configurations through one single MaxScale setup, so MaxScale can listen to one prot for one set up servers and routine setup, and on another port for a different setup.

With this we have an idea how MaxScale work, so let's see if we can tie it all up.

MaxScale as a Binlog server

As can be seen from the description of MaxScale a lot of what is needed to create a Binlog server to use as an intermediate server for slaves is there. What is needed is a router module that acts as a slave to the assigned master, downloads the binlogs from there, using the usual MariaDB / MySQL Replication protocol. This routing plugin also needs to serve the slaves with the downloaded binlogs files. In theory, and also in practice, the slaves will not know if it is connected to the real master or to MaxScale.

Using MaxScale this way as an intermediate Master, a slave that connects to the MaxScale can work from the same Binlog files and positions as when connected directly to the master, as the files are the same for all intents and purposes. There will be no extra delays for long running SQL statements as these aren't applied on MaxScale, the replication data is just copied from the master, plain and simple. As for parallel slaves, this should work better in when using MaxScale as a Binlog server, but this is yet to be tested.

So there should be many advantages to using MaxScale as a binlog server compared to using an intermediate MariaDB / MySQL server. On the other hand, this solution is not for everyone, many just doesn't drive replication that hard that the load on the master is an issue so that an intermediate Master is requited. On the other, many use an Intermediate Master also for HA, and in this case it would have be advantageous to use MaxScale instead of that Intermediate master, the latter which could still server the role as a fail-over HA server.

Now, there one issue with all of this that many of you might have spotted: That cool Binlog server plugin module for MaxScale doesn't exists. Well, I am happy to say that you are wrong, it does exist and it works. A Pilot for such a module has been developed by SkySQL together with that had just this need for an intermediate server that wasn't just yet another MariaDB / MySQL server. For the details on the specific usecase, see the blog by Jean-François Gagné.

Monday, June 16, 2014

MyQuery 3.5.5 Released

I have just released version 3.5.5 of MyQuery, which is a minor feature, cleanup and bugfix release. If you don't know MyQuery since before, this is an Open-Source Windows based MySQL and MariaDB ad-hoc query tool. What makes MyQuery slightly different from all the other similar tools is that MyQuery has a focus on SQL-scripting, allowing statements in a script to be run one at the time, to restart a script where it left of in the case of an error and some other features like this. MyQuery features colour coded syntax as it uses Scintilla for editing, and this is highly configurable. Also, multiple editing tabs are supported. Another thing that makes MyQuery stand out a bit is that it is highly flexible. If you have SQL statements that you run often to monitor the state of the server or your application, then it is real easy to implement this as a simple tool accessible from the MyQuery menu.

There is much more to MyQuery than this, so if you think this sounds cool, then download it from sourceforge and give it a shot, you might like it!


Tuesday, April 29, 2014

SQLStats plugin version 1.4 released

I have now released version 1.4 of my sqlstats plugin for MySQL and MariaDB. As of this version I focus on test with MariaDB, but it should work with MySQL also. In addition, I have now made the code a bit more portable by removing some features that depended on the THD struct.

So, what is sqlstats then, you ask? Well, the idea is to keep track of executed SQL statements and gather up statistics on how often they are used. The interesting thing is that sqlstats, before counting the number of executions of a statement, "normalizes" it, by which I mean that it regards
SELECT prod_name FROM products WHERE prod_id = 90;
as the same statement as
SELECT prod_name FROM products WHERE prod_id = 212;
This is useful to track those statements that takes very little time to execute, so probably they don't show up that often in SHOW PROCESSLIST, but are executed to so often to the performance effect can be real bad.
Also, this is useful to track those fast statements that are executed often, but that actually doesn't need to be executed that often as the data is cached or the result is already known or something.

So how does all this work then?
Well, the plugin is an AUDIT and an INFORMATION_SCHEMA plugin in one. The AUDIT plugin part is used to track the statements, every statement that is execute in the server passes this. When a statement is processed by this plugin, the statement is normalized and then it is checked for existence in a list of least recently used statements. This is to ensure that I don't have to keep track of all statements executed to keep a top list. Then if the statements is executed so often that it should be on the top list, then it is placed there. The reason I can't do with the top list only is that if a "new" statement that is frequently executed gets in, and the top list is already full, it will never get on the top list at all (as it is not on the list, it's executing count is 1 and it doesn't reach the list, then the same thing happens next time etc).

These lists of statements are kept in memory inside the plugin itself.

Now, to see what the top SQL statements are, this is done by the plugin also being an INFORMATION_SCHEMA plugin, so the data is available by a simple select:
SELECT * FROM information_schema.sqlstats_topsql;
And that's it!
There is also an INFORMATION_SCHEMA table to show the last executed SQL statements:
SELECT * FROM information_schema.sqlstats_lastsql;

Now, you may ask what the performance overhead is of all this, and fact is that it's rather small, mostly hardly noticeable. In addition, you can switch it off using the sqlstats_enable global variable.
And the size of the top list and lru lists can also be adjusted by in the sqlstats_top_stmts and sqlstats_lru_stmts global variables respectively.

The plugin is downloadable from sourceforge which contains the sourcecode which uses GNU autotools to build it. Also, there is a full documentation pdf there.


Monday, March 3, 2014

MyQuery 3.5.4 Released - Now with Client side dynamic columns!

It was quite a while ago since I released a new version of MyQuery, my Windows based query tool for MariaDB and MySQL. I did build a 3.5.3 version, but I decided not to do a public release of that. But now we have 3.5.4 ready, and there is some new shiny features in it. The main feature is that the dialog that you can use to inspect individual fields with, which pops up when you doubleclick (or now you can Right-click and select Show Menu) in a field, has been expanded.

To being with, you can use this to navigate the fields, so that you don't have to close this dialog and open it in the next field, instead there are navigational buttons in the Dialog.

But there is also some additions to the dialog itself. If you are using MariaDB and the Dynamic Columns feature, which is mostly used with the Cassandra Storage Engine but can be used with any blob data in any engine, then you can view these in the ShowData dialog, either as a tree or as JSON.

Another addition is that I have upgraded help links to MySQL 5.6 and MariaDB 10. Also, a lot of MariaDB features and now syntax coloured as appropriate. Another usability feature is the ability to copy a single column from a query result to the clipboard.

Download MySQL 3.5.4 from sourceforge and enjoy!


The soul of a new machine

I re-read "The Soul of a new Machine" (Swedish translation "En Dators Födelse") by Tracy Kidder the other day and it still quite a marvelous book. It was a long time ago since I read it last and I have to admit this really made me feel old. And I admit: Yes, I worked with VAXen (yes, that is what pluralis of a VAX is, among us hackers from the days when being a "hacker" was a good thing. I used to be called Anders "Hackin'" Karlsson back then) and I worked with Data General machines and I have also worked as a Hardware Engineering back when working with a CPU was not necessarily something done in software and when if you didn't have enough bits in a register in a register in the CPU, you could add them yourself with some TTL chips piggy-backed onto the ones on the main board itself (in this case the "register board") and then attached at appropriate points with hand-soldered wires and some paths on the board appropriately cut.

Piggy-backing was used to get power to the added chips, and to mount them somewhere convenient, instead of having the just floating around in the wires, it looked somewhat like this:
But instead of connection all pins you just connected the power and grounds ones (on the upper right and lower left corners respectively) and the other pins were bent to point straight out, and then you soldered the wires to these pins and then attaching the other end of these.

And before you ask, before going on to do this, this had to be designed, somehow, and I did that too. In addition, you had to understand the original design of this, but that I figured out without too much work, What I never did do was to write even a single piece of code for this monster machine. How I got involved with this was interesting. I was working as a sysadmin for the development system machine for this box, which was a PABX based on Meridian but modified by Swedish Televerket (not Telia). When this localized version needed upgrading (as machines requiring higher capacity had been sold), it was assumed that the newer Nortel Meridian parts could be used. But this rurned out not to be that case, Nortel and developed the Meriad further since the Localized version was released, and the local version had also changed, but in a different direction, the parts actually wouldn't even physically fit!

Disaster! What do we do NOW? The customer has already bough this thing and now we can't deliver? And in Sweden, there wasn't much hardware enginering resources for this puppy anymore. So someone the manager of the development of this box thinks a bit and then realized that there is this weirdo young sysadmin, who by the way knows absolutely nothing about Meridian (but a fair amount about the development system for it, which was running Unix, Interactive Unix, based on Unix Version 6, no less! running on a PDP/11 70). This sysadmin boy had at some point said something that he was building his own home computer (which I was) so maybe he could have a look at this? Now, I was not the least qualified for. Not at all, so I obviously said no... Not! I think many of you were like that in your twenties also: You knew just about EVERYTHING on computers, and if someone asked you to do something, you just said yes. In this case I was asked to extend the page bank register in the SL/1 (which was the technical name for the Meridin and was a 16-bit machine with multiple 64k banks of memory).

Did I get this to work? Yes, I did. Don't ask me how, though. Today, some 30 years later, I still don't understand how I managed to do this, and if I was asked today, stuffed to the rim with training, knowledge and experience, to modify the CPU of some machine, I would assume the person asking had used excessive amounts of paint remover, but not with the intent of removing paint but with the quite successful intent of removing whatever small portion that is left of common sense from his/her brain.

This was my last job at Telia / Televerket (not because what I did didn't work. As far as I know, the CPU I designed and then built, was put in beta test at the Telia office and when I later called an old colleague there and asked what happened to my Heath Robinson style PABX I was told that it was in daily use and that I was, in fact, using it right as we spoke).  After this I joined another telco operator and then I went on to work for Oracle, where I at first was the local VAX guy and later the local Unix guy (also, I was the local Mac guy as I was the only one there who had even used on, and I had used it to run Pagemaker for editing a fanzine that I was working on at that time).

OK, time to wake up. Maybe I should port MySQL to the SL/1 or something today? But I need an extra bit in the pagebank register for that I guess? Well that can be fixed, is there some young, unsuspecting sysadmin around?

Also, read "The Soul of a new Machine" if you haven't done so. It does say a lot about how people in the IT indistry work and how we look at ourselves. And it's a probably a good read for a spouse or girlfriend/boyfriend who is not in the IT-industry, as this is not really a technical book (except to a very small extent).