..a dose of zero-day know-hows ..

12/06/2008

splitting large sql files using split

The split command have a couple of essential switches, one may either split per line,per bytes or line per bytes. Lets see which is best applicable for SQL dumps.

Splitting per line:
Splitting a very large SQL file generated from mysqldump will not be optimal and effective to be splitted using the per line switch, simply because data from one table will take up one line only regardless how large it is making the split almost useless.

Splitting per byte:
Splitting an SQL file per byte can be effective at times, but there is no guarantee that the the input file will be splitted in the proper string positions. SQL errors is likely to happen in situations where the splitted sql files contain incomplete query at the end because it is terminated in between a line.

Splitting line per byte:
This is the best way to split a huge sql file. It splits the SQL file in bytes specified but takes into account the line so it does not split the SQL file in between. Ex:


split -C 40312832 data.sql 


In the example above, data.sql is splitted into several files (xaa, xab xac... xaz..) and tries to fit as much line to the specified bytes. Although the output splitted files will not have the exact size as the byte specified, split will try to fill as much but making sure that no line is truncated between, thus assuring that the query will be valid upon execution.

Availability:
split command is part of the GNU coreutils and is available in almost all linux distributions and *nix based operating systems. In some OS's like Solaris 9 Sparc, you may need to upgrade the coreutils package that is prebundled with the system to have access to split binary that supports the -C switch.