Today I was developing a script that was going to get some data from an uploaded file to start a batch job, I did the front end with the help of uploadify and everything was ok, I promptly did a php script to get the data.

I’m using the file() function, (I’m lazy I know), It was working fine, I did a test case and was displaying the output to the console, everything was working as expected, I’ve played a little with the flags and did a couple of validations to handle Windows, MAC and Unix line endings, and it was good, so I moved on to the model part, and after my first test inserting some data to the db I had a very weird problem.

The first line, was definitely something wrong, I wasn’t able to see any weird character from the mysql console, but I’ve noticed some strange character when i viewed with mysql browser, whenever I printed the result seemed ok, I’ve also did some debug on the queries and it was ok, but in the mysql query browser this weird square was showing for the first row.

I thought that this was an encoding issue, so I’ve started looking at this, and what I’ve learned Today was that some editors, add this apparently useless Byte Order Mark to the beginning of UTF encoded files, this are the bytes added 0xEF,0xBB,0xBF.

I’ve spent almost an hour trying to get a workaround for this, so after some testing this was my solution to get this done:

$data= file($fullFileName, FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES );
if ($data) {
//Remove BOM
if(substr($data[0], 0,3) == pack("CCC",0xef,0xbb,0xbf)) {
    $data[0]=substr($data[0], 3);
}
    $this->processData($data);
}

I’ve tested this with ANSI, UTF-8 encodings and with Windows, Unix, and macOS CR endings and it worked, I think that there might be a better solution, but this has taken me too much time Today, so I’m good with it.