How to ignore Invalid byte sequences in a Postgres Dump File
Monday, November 15th, 2010I needed to restore a client Postgres database recently. However I discovered the pg_dump v8.3.9 dump utility produced invalid byte sequences in the dump file.
When importing the file using the following v8.4.5 psql command:
- psql -U username dbname -f pg-dbbackup.YYYYMMDD.dump
Each and every invalid byte sequence in the dump throws an error like the following one at a time:
psql:pg-dbbackup.20101115.dump:45711: ERROR: invalid byte sequence for encoding “UTF8″: 0xc8ed
HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by “client_encoding”.
CONTEXT: COPY PostRequestField, line 13274
Note, you can identify the line where the byte sequence occurs by finding the line where the ‘COPY “TableName” FROM stdin;’ statement occurs. Add this line number with the 13274 reported above and this should be the line where the byte sequence occurs in the dump file.
Note, if you have one invalid byte sequence, you may have thousands of other invalid byte sequences. One way to deal with the invalid byte sequences in one fell swoop is to use the iconv command.
iconv -c -f UTF8 -t UTF8 pg-dbbackup.YYYYMMDD.dump > pg-dbbackup.YYYYMMDD.dump.txt
pg-dbbackup.YYYYMMDD.dump.txt should import without invalid byte sequences now. Note you might be stripping important information out of your data. But at least the dump re-imports into Postgres database and you can do things with it.