Archive for the 'Postgres' Category

How to ignore Invalid byte sequences in a Postgres Dump File

Monday, November 15th, 2010

I needed to restore a client Postgres database recently.  However I discovered the pg_dump v8.3.9 dump utility produced invalid byte sequences in the dump file.

When importing the file using the following v8.4.5 psql command:
- psql -U username dbname -f pg-dbbackup.YYYYMMDD.dump

Each and every invalid byte sequence in the dump throws an error like the following one at a time:

psql:pg-dbbackup.20101115.dump:45711: ERROR:  invalid byte sequence for encoding “UTF8″: 0xc8ed

HINT:  This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by “client_encoding”.
CONTEXT:  COPY PostRequestField, line 13274

Note, you can identify the line where the byte sequence occurs by finding the line where the ‘COPY “TableName” FROM stdin;’ statement occurs. Add this line number with the 13274 reported above and this should be the line where the byte sequence occurs in the dump file.

Note, if you have one invalid byte sequence, you may have thousands of other invalid byte sequences. One way to deal with the invalid byte sequences in one fell swoop is to use the iconv command.

iconv -c -f UTF8 -t UTF8 pg-dbbackup.YYYYMMDD.dump > pg-dbbackup.YYYYMMDD.dump.txt

pg-dbbackup.YYYYMMDD.dump.txt should import without invalid byte sequences now. Note you might be stripping important information out of your data. But at least the dump re-imports into Postgres database and you can do things with it.