Rails migrations, and a painful lesson
The other day, one of my colleagues said something that grabbed my attention, while working on one of our biggest rails apps.
"I think I'm about to break all of the migrations."
Despite my best efforts to fix this problem verbally (I was quite content in my chair), I went over to take a look. He was right, something was seriously wrong.
He had just created a new migration, but running
rake db:migrate was giving the message that all migrations had been run. We could see that wasn't the case, as the new migration was sitting there. But it also wasn't in the history of run migrations. What was going on?
On closer inspection we noticed that the migration file wasn't ordered correctly in the migrations directory - it should have been at the end when listed and sorted, but it was somewhere in the middle. That meant that the version number was wrong, or all the other version numbers were wrong. The default version prefix for migrations is the date stamp, which is made up of the year, month, day, hour, minute and second, with no separators. For example, "20130729092035" is the time and date right now. The date stamp for the new migration was absolutely fine, but there was something very wrong with the 20 or so migrations that followed: they started with "2015".
What must have happened is that one of our developers' PCs had an incorrect date for a period of time, which meant that all generated migrations were also incorrect. It explained a whole host of slightly weird issues that we'd been having for the last couple of weeks. Amazingly, no-one had noticed. But then again, when do you check date and timestamps in file names?
In the end, we fixed it by using git to tell us when the migrations were committed, and renaming all the migrations with the new date stamps. We then had to clear down everything in the
schema_migrations table and re-populate it to where the current schema was. This then had to be repeated on our staging server - fortunately the project wasn't yet at the point where it could go live, so there were no production server issues to deal with.
The crazy thing is that this slipped through our Q&A process. Every bit of code is submitted through pull requests, and each pull request has to be approved by two other developers. Yet 20 migrations with completely wrong date stamps slipped through. The main problem is that this isn't particularly surprising, and I'm not really sure I've learnt much. You can't predict everything going wrong, and you can't account or look out for all bizarre edge-cases like incorrect system dates. Sometimes you just have to deal with the issues when they appear.