Bucardo replication and PostgresSQL some lessons learned

Posted at 7:03:05 PM in SQL (4)

It was time to start the replication services on my server. After reviewing the different packages, Burcardo was the one I chose.

I followed the steps for the most part on bucardo.org. The site seems to be out-dated. From what I could tell, it hasn't been updated since 2009, however; if you look at the links to the discussion pages, you can see the developer is very active as are the users. I believe his/she is mostly involved with development of the software and not as concerned with the web's content. It's nice that this application is free. I certainly takes a lot of the developer's time.

There are many comments about Bucardo 5, but I wasn't able to find any links to it. The version of Bucardo I installed was 4.99, so I must be close. The documentation also isn't that great. For example, herd has been replaced by relgroup, but the documentation uses both terms and after some search, I found that relgroup is herd, but many internal parts still use herd. I'm not sure if that means that herd can be used instead of relgroup or if relgroup cannot be used for all commands. 

I could not find anything on conflict strategy on the web site. bucardo_ctl also seems to be missing in my version or has been replaced with just bucardo. All the commands on the web site that are used with bucardo_ctl works with bucardo on my system. However, I have a Ubuntu version 12.0.3T on another machine that I installed the same way as I did this version of Ubuntu 13.x.x and the Ubuntu version 12 does have the bucardo_ctl.

Back to conflict strategy. When doing an bucardo add sync, you can see the options allow for a conflict_strategy option with it's various options, but you cannot find anything on it on the web site. I didn't put in the conflict strategy when I created my sync so I tried to update the sync with the bucardo update sync yada yada, but everything I entered there only gave me an error message without any clue as to what was missing.

No matter what options you supply, this is the message you get every time:

root@db:~# bucardo update sync


      bucardo update <type> <name> <parameters>

    Updates a Bucardo object. The `type' specifies the type of object to update, while the `name' should be the name of the object. The supported  parameters for each type are the same as those for the add entry  elsewhere in this document. The supported types are:


The internal help or error system helps with the type, which is correct, but no help on the name or parameters. This is the command line I used, which based on the bucardo add sync options should have worked...

bucardo update sync first conflict_strategy="bucardo_source"

I tried without the quotes also, both resulted in the same error message. Access to the database bucardo maintains these settings in is available, so I intend to add other sync's and see what the field conflict strategy gets updated to and then manually correct the tables for those that I can't update.

The other frustrating thing is that the documentation states that the dbs, which identify the data sources, specify that the first database will be the source and the rest will be the targets. This did not work for me. In fact, I wasn't able to get it to work correctly until I specified each database and how I wanted it to process. 

The sample on the web site confused me as it appeared that they created the two test databases on the same server (they probably did, but I wondered who would want to replicate data on the same server since the object of replication is to have a backup or load balancing [which is an option for discussion all by itself]). I also misunderstood that bucardo needs to be installed on all the servers replication is to take place on, also not true. In fact, bucardo doesn't need to be installed on any of the servers master or slave.

I'm not sure why my installation didn't follow the documentation. I made a backup of my database because I wasn't sure what was going to happen and it was a good thing I did too. The server my source database is on is the same server I installed bucardo on. So the connection I made to my source database was using localhost. The second database was at another location which I specified with the FQDN. This was the first concept that I had to get straight in my head.

Since bucardo only needs to be installed on one server (though technically it can be installed on other servers and I suppose that would be necessary for chaining to work, i.e. the slave database could also be the master for another database and right now, with my limited understanding, the only way I can see that happening is if bucardo was installed on the slave with a identifying that machine as the source and another as the target), all the database connections need to be identified on that server. Each database that will be involved is specified in the bucardo add db command line which provides the connection string information, whether it be on this server or another server.

The documentation does say that on slave servers, only the bucardo user (super user) needs to be on the target machines. When I setup my first trial, I used the bucardo user, but later installs, I used the actual user that connects to that database and it has worked fine.

The description of the db <name> and dbname <name> didn't really explain to me what I was doing. It lead me to believe that the names for the same database in the different servers would be different, but that makes no sense at all. Those database names have to be the same if you're going to be using them for load balancing. You cannot specify in the same application that when you connect to this server, use this name or that server use that name, doing places the load balancing entirely on the software. Might as well also have the software also handle the replication. But the real problem is in the bucardo database. You have to have a separate name for each database even though they are named the same or you can't specify source and target of the same name. The misconception of mine came from the idea that bucardo would be talking directly to the other server's bucardo installation, but that's not the way bucardo works.

A good naming convention for dbnames is to specify the master and the slave. So, if I were doing a database on server1 as the master and database on server2 as slave and the name of both databases was store. I'd create two database entries in my bucardo table.

bucardo add db store_master dbname="store" host="localhost" port="5432" user="store" pass="password"

bucardo add db store_slave dbname="store" host="remote.store.com" port="5432" user="store" pass="password"

now the add sync uses the db names I just gave those two identically named databases as store_master, store_slave instead of store or localhost and remote.store.com.

I think it would be helpful to store the connections in a separate database in bucardo and identify the databases in those connections as a separate collection. I'm not sure if the current version I'm using will allow for that.

Now the problem I had, using the above info, I did my add sync like this

bucardo add sync first relgroup=first dbs=store_master,store_slave

The relgroup is another story altogether, but lets just say relgroup first was already defined as having all the tables in the database.

You'd think, based on the documentation that store_master was the source and store_slave was the target, but that's not what happened. According to the documentation, the first dbs is the source and the second and remaining dbs are the targets unless you override it with :source or :target.

When I tried to push the data with a one time copy to the target, it was actually pushing from what I intended to be the target. The result was adding a record to the actual source database caused an unique index error on the primary key. The settings can be reviewed by issuing the bucardo list sync command. this will show all the dbs being synchronized and which is the source and which is the target. When I flipped the source and target around opposite what the documentation said to do, it still made my source the target, so I had to specify exactly what I wanted:

bucardo add sync first relgroup=first dbs=store_master:source,store_slave:target

Then and only then did it correctly assign the source and target. Best practice before activating the sync is to check it. Not being specific, leaves the system to willie nillie make source and target assignments.

Another thing I noted is the source database gets another schema called bucardo, which is probably why the bucardo user needs to be a super user (just guessing though. I didn't provide the credentials for Bucardo in the connection string). The that schema has all the triggers and databases (delta) needed to perform the synchronizations. The delta tables fill up when you add or delete or update records in the main table. Then the delta tables are emptied as the updates are completed in the target tables.

About relgroups... I created my bucardo dbs but using the addalltables and addallsequences which didn't create the relgroups as I thought it would. These should have been added with relgroups with the same name as the db. I had to go find the information and create the relgroups manually.

The table names also need the schema they are in. So, if your tables are in the public schema which most are, then you'd add the tables and sequences as add table=public.table1. The documentation doesn't show this in the example provided

I have only added one database. I have several on the same server, but I wanted to make sure that the updates worked as expected and that one database has been updating fine.

One observation: If this is to be  used for load balancing, it can only be used for lookups. Any program would have to take careful consideration when assigning the database connection for updates, adds and deletes to the main database and look ups to a connection that could pull from either database, or only the slaves. There are areas that will need to possibly split the difference and if the main is down, simply skip over, such as when a client logs in. I the credentials and if he successfully logs in, I update his record with the date and time of the login. After considering this a little bit more... the web site depends on the data to even function, but there's no reason to add more content when the main database is down. However, information that can't be viewed unless the user is logged in won't be available. 

Written by Leonard Rogers on Thursday, August 1, 2013 | Comments (0)

Setting up a self-signed certificate for use with Windows 7 IIS 7.5 and Postgresql

Posted at 1:17:11 AM in SQL (4)

I was setting up a local environment to do some design work and since I set my Postgresql server to only accept certificates that were signed, I had to set this up on my development PC as well. I already had pgAdmin working with the certificates. The environment variables were correctly set, but I kept getting this error below.

Microsoft OLE DB Provider for ODBC Drivers

could not get home directory to locate root certificate file Either provide the file or change sslmode to disable server certificate verification.

What to do... 

Because the environment variables pointed to a location on my hard drive that IIS didn't have access to, I added those permissions. When I made changes on my 2003 server, I was able to start and stop the World Wide Web Publishing service and the environment variables were loaded. However, that did not work this time. I'm not sure what service needed to be restarted, but a reboot of the system properly loaded the variables and IIS stopped complaining about the location of the certificates.

Written by Leonard Rogers on Tuesday, July 30, 2013 | Comments (0)

Postgresql and client side SSL setup for Web Servers

Posted at 8:16:57 PM in SQL (4)

I'm using Posgtresql 9.1 on Ubuntu 13.x.x 64-bit. I actually don't remember if I installed OpenSSL on it or not, but those are the packages I used during this install. It is one of the more difficult tasks I've had to complete. A few nice things before I go into the details of installing SSL for the ODBC client on a Server 2003 IIS server for use with my web pages.

I noticed that the Postgresql 9.1 and OpenSSL already enabled SSL by default without authentication. That means that the data being transmitted between client and server was encrypted, but it was not verifying the certificate as being one that the server has approved. i.e. no key or crt was being provided by the client at all. And the client crt was not signed by the servers CA.

By default, there is a crt and key symlinked to the Posgtresql data directory (which on Ubuntu is located in \var\lib\posgtresql\9.1\main). In the data directory they are called server.crt and server.key. There was no root.crt which as I understand it contains the CA (certificate authority). Without root.crt, any client that requests an SSL connection will get one. As soon as you put a root.crt file in there, the requesting connection will be checked against that file, however, only verify-ca and verify-full will fail if it doesn't match.

The most helpful link was this one on howtoforge.com. The other link I read only explained how to make the server certificates (info on  xtuple.org). But nicely, xtuple.org also explains how to setup SSL on windows installations of Postgresql. I thought there would be a problem in creating the server certificates with the command line arguments of the xtuple.org and the command line arguments on howtoforage as I did the server ones and the clients ones using the different sites respectively. 

The only problem I ran into while setting up the server with self-signed certificates was the crt and key files already existing in the data directory. These are symlinked and provided by OpenSSL; linked from snakeoil.crt and snakeoil.key in the OpenSSL directory. Copying over the new files would not overwrite the symlinked files. I had to delete the symlinks first. I probably should have unlinked them then copied in the new certs, but I didn't have any problems.

After creating the client crts per the howtoforge.com site, I didn't know where to place them on the client server. The default location to look is the <user>\appdata\postgresql directory, but the web user IUSR does not have a profile and there for no appdata directory. This site gave a little more info, but the individual was trying to get the windows cert manager to host the certificates so it gets a little confusing. After seeing the environment keys that needed to be set the documentation on the postgresql site made a little more sense.

Basically, you need to set these environment variables:

PGSSLKEY="C:\path\to\my.key" <-- modified from the web site as capi is not part of OpenSSL.

That's done by right clicking my computer and selecting properties. Then open the advanced tab and at the bottom open the Environmental Variables button and add the variables to the system set in the bottom window. I replaced the capi:My name with postgresql.key in the same directory as the other path info. I was hoping it would take path and the ODBC client would search for they that worked, as this will create problems with connections to different replicated servers. I am guessing that I should be able to use the same keys and certs but add the CA authority to the root.crt file for any other servers that might be handling connections from this client, but I haven't tested that yet.

The last part is to change pg_hba.conf so that it only accepts SSL connections. To do that, edit the pg_hba.conf file which is located in /etc/posgtresql/9.1/main on Ubuntu. Add the line:

type database user address method options <-- don't add this line, for information purposes only

hostssl  all  all  cert clientcert=1

I also specified the addresses these connections can connect from. Technically though, if you are using a certificate and not a password, then it should be safe to expose access to any client that has a certificate.

There is also a suggestion that hostssl be put before host lines, but I'm not sure of the reason for that. I tested this with pgadmin. You can set the certificates for the database connection properties under the SSL tab. As successful connection shows in the properties for the connection SSL encrypted.

Written by Leonard Rogers on Friday, July 26, 2013 | Comments (0)

Working with Arrays in a Select Statement for Postgre

Posted at 11:17:40 AM in SQL (4)

This post is for PostgreSQL

I had a field in a database that consisted of city and state in the same field. It's a lot easier to create a single field out of two fields than it is constantly have to split that field up. For example, If you have a field called city and a field called state, you could easily make a citystate field that could be represented thusly: city+', '+state. The other way around requires finding the comma, removing the leading space, handling the exception that there is no comma or that there is no state or that there is only a state and no city (and those exceptions may have to be looked at for making a composite field like citystate, but it is easier to not print a comma if one fields [city or state] is missing).

What I decided to do was create an array out of the already composite field and split it on the comma. In Postgres, that function is string_to_array. However, string_to_array(citystate,',') produced an array that had double quotes around the state, in order to preserve the leading space. So I reverted to this: string_to_array(replace(citystate,' ',''),',') which gave me a nice 1 dimensional array with 2 elements (city in element 1 and state in element 2), but extracting an element from a function call is quite another problem. Since the function is inside of a select query there is no feature which allows us to make temporary assignments so we could assign the return value of a function to a temporary variable in order to do more operations on that variable. Of course, this is what stored procedures (or stored functions for Postgre) are for. We could easily pass this into a stored function and get the value back out, but is it possible to get the element out of the select statement without using a stored function?

Our table has 2 elements:

CREATE TABLE capitals (
id serial,
citystate character varying

INSERT into capitals (citystate) values ('LOS ANGELES, CA');

SELECT string_to_array(replace(citystate,' ', ''),',') as data from capitals;



Dang it, I stepped in it big time.

I was hoping the get the element out of the select statement by appending the element index on the end of the function, which I thought wouldn't work... and it didn't, like so:

SELECT string_to_array(replace(citystate,' ', ''),',')[1] as data from capitals;

Which returns an error. I just read on this post here that surrounding the entire expression would return the element like so:

SELECT (string_to_array(replace(citystate,' ', ''),','))[1] as data from capitals;

And this worked. This is what I ended up doing which made me decide maybe a stored function would be a better answer, but I got this from the Postgre documentation and then convoluted it to fit my situation, here it is for the dart board:

SELECT (SELECT f1[1] as city from (SELECT string_to_array(replace(citystate,' ',''),',') as f1) as ss) from capitals;

It worked of course, but I didn't want to do it again for the state.  I got the sample I followed from this site. As all the functions and examples always returned arrays, I needed one that returned the value of an element and there was only one on that page that did it. 

Written by Leonard Rogers on Thursday, March 7, 2013 | Comments (0)