Character problems in files

What I always recommend to my clients is that they do not use special characters in file names, whether they are images, pdf files, etc.

And you will say, what are the special characters, those that are out of the “normal” letters (no accents, no ç or ñ) and numbers.

Ideally, WordPress would “clean up” those file names and upload them with standard names, lowercase, no spaces and just letters and numbers with nothing weird.

The problem

But many times we inherit old installations, with many entries, incorrect coding in the database, etc. But what I found this time when I moved a web site with many images of this type (like casmiseta-niño-rojo-chillón.jpg) to a custom Vultr HF server, is that these files were displayed with “strange characters”.

But the worst thing was not that, but that they did not match those stored in the database and, therefore, were 404 errors.

Even if I uploaded the file via SFTP forcing it to UTF-8 from Filezilla, it showed up with the strange characters in the file system.

FileZilla force UTF-8

I also tried compressing them in zip, uploading and decompressing and the same problem. Well, here the problem was already located and it was the UTF-8 support by the operating system.

Warning: this is something that will not happen in any hosting, since by default they are configured correctly to work with UTF-8.

The solution

The obvious solution was to enable UTF-8 support. First we check the locale of the system:

# locale

And indeed, we see that it is configured in LANG=en_US like the other variables, instead of having LANG=en_US.UTF-8 or LANG=es_ES.UTF-8 if we have it configured in Spanish. So we have to reconfigure it, for which we execute:

# locale-gen "en_US.UTF-8"

Or if we are configuring it for Spanish:

# locale-gen "es_ES.UTF-8"

And then:

# dpkg-reconfigure locales

We follow the steps by selecting our locale and the same default:

dpkg-reconfigure local

And we already have the system well configured, which should already be previously and by default as UTF-8, but in this case it is due to a small bug in the installation.

We can reload bash by running source ~/.bashrc (or the system we have as zsh with its corresponding command source ~/.zshrc) and check with # locale that it is now correctly configured:

locale en Vultr HF

Join my superlist ;)

I won't share your details with anyone or bombard you with emails, only when I publish a new post or when I have something interesting to share with you.

Leave a Comment