How to transfer data from MS SQL to PostgreSQL or “good” design vs speed
TL;DR
- The fastest way certainly is the shortest one. You can use foreign data wrapper and map the MS SQL table to your Postgres instance (using tds_fdw)
- The design could be good only if it suits its primary goal. And you need to create prototypes carefully. Because they can be deceptive.
Details
So, we had the task to rewrite the Old System and decided not to use the existing database schema(to prevent leaking of abstractions). The Old System database was MS SQL and for the new one, we choose Postgres. Naturally, we created an AntiCorruption layer and started simple. Getting Aggregates (DDD) from Old System and saving them one by one. We’ve been told time is not the issue and the algorithm could work slowly. And the hope was to drop Old System as fast as possible. So it seemed even silly to invest a lot of time in designing the AntiCorruption Layer.
We came up with a design like this
It worked and initial data for testing and developing was received. We proceed with New System and decided to finish the AntiCorruption Layer than our domain model shape up (smart decision, right). Guess what, we were almost at the release point and synchronization was not ready. Even the first rough solution was not working, because schema evolved a lot. So, we introduced a more robust sync algorithm and updated AntiCorruption Layer. Now the design looked something like that
Now it is obvious that the layer was overdesigned, but we were trying to get it right. The solution worked ok and we turned full synchronization. And it worked. Like expected. Except it took 5 days. The simple tweaks help to reduce time to 40%, but obviously, 2 days was not acceptable. We had one day before the deadline and no good solution.
So I decided to try something simple and quick. And found out that all the work could be done in Postgres. One can use fdw (foreign data wrapper) with tds_fdw. You can simply query data from another database. To make this even faster we stored temp data from MS SQL in unlogged Postgres tables and then perform the required operations. It looks like this
create extension tds_fdw;create server mssql_svrforeign data wrapper tds_fdwoptions (servername '1.1.1.1', port '1433', database 'dbName');create user mapping for SomePosgresUserserver mssql_svroptions (username 'SqlUser', password 'Password');create foreign table tableNameInPostgres("integerColumn" integer,"textColumn" text)server mssql_svroptions (schema_name 'dbo', table_name 'tableNameInSql');select *
into unlogged tableNameInPostgres_temp
from tableNameInPostgres;
And that’s it. Of course, you need to spend some time installing tds_fdw — but they have pretty straightforward guidelines. How long do you think does it took to sync 2 databases? Bear in mind it took almost 5 days the first time.
The answer is less than a minute! Clearly, you can blame our overdesigned solution and lack of coding skills, but I think it’s not that easy. My take would be — make the solution as difficult as it needs to be (follow YAGNI) and try to find ways in different directions (we were sure there is no way to make cross different database communication directly and long time was reluctant to search for tools to accomplish this kind of communication)
P.S. I made a schema for the final solution to emphasize the difference
References: