Boot Camps: Effective or Naw?
Trying to get into the field in an entry-level position with little experience? Or are you switching from a similar career (engineering, academic research, computer science) and just need to spruce up on the tools?
Have trouble with the unregimented, unmonitored environment provided by Coursera, MOOCs, textbooks, or YouTube channels? Are you also turned off by multi-year Master's degrees and their bloated tuition?
Are you able to take 3+ months off or work heavily during evening?
If you answered yes to all of the above, then a boot camp might be right for you.
Working as a mentor for a data analytics boot camp, this is a question I often get asked. Up until this weekend, I only had the information the boot camp gave me and some cursory exploration to inform me. So, I decided to further educate myself. Going in, it seemed like the prospect of boot camp graduates was all over the place. My recommendations to my students were to really focus on their portfolios, as that's what would set them apart. Largely, that still feels true. But let's investigate the whole thing.
First: Your Relative Goal
To me, this is the most important factor in determining the value of education. It's paramount to discuss that each person's end goal will take more/less energy than another peer and be of a different length. Data Analysts need to know some key metrics about the field they're in, but much of their work is in SQL, Excel, and Tableau/Power BI. The stats-y and model-y stuff often gets saved for Data Scientists, who have usually studies quite a bit of math and have coded for an egregious amount of time. Analyst can be a good stepping stone for those new to the field looking to rise. It's also a wonderful entry-level role, given that there's an opportunity to advance. How far you want to go, how quickly, at what cost, and where you are now are all factors for answering what to do next. For example, boot camps are super effective for PhD's who already have the necessary coding and mathematical backgrounds to become data scientists. Many folks in academia are already doing lots of coding, analysis, and thinking on another plane of existence. They can transition very quickly, even directly into senior roles. You'll find evidence of this in testimonials of many boot camps and in many forums like Quora. Boot camps are only going to be 2-6 months long. That's just not enough time for someone to learn calculus, statistics, matrix algebra, how to code, how to scrub data, build and compare models, visualize data, present persuasively, then deploy a production-quality model in a second programming language. Don't expect to come out of one a demigod with draped in a cloak of wisdom.
Here's an alternative scenario that tends to work. There are tons of analyst roles that require smart people to manipulate data, pull from databases, ensure proper inputs, make some visuals or dashboards, and offer some high-level insights. I teach a Data Analytics course, which stops at introductory coding. It also takes six months with more of that time in Excel, data visualization, critical thinking, and presentations. This route works pretty well for those with mathy backgrounds in college, use Excel on an intermediate level, or are looking for a comprehensive refresher. If you've not used Excel and are very rusty in math and functions, I'd say some preparation is needed first. Most boot camps have some guidelines before they'll accept students.
For those who are wondering if a boot camp can replace a Bachelor's/Master's degree, consider the following. I wasn't very competitive as a data scientist until I had a very robust background: dozens of R projects (simulation, Bayesian analysis, NLP work in academia, all types of regression, machine learning and stacking, and indescribable data cleaning), several Kaggle computer vision projects with TensorFlow in Python, figuring out Spark on Databricks to speed up and manage my sklearn models, and independently building client-facing dashboards in Tableau. In interviews, I'd still be asked about my experience in areas which I had none. The field is too broad to be fully prepared for anything, but it also takes thousands of hours to get there. Even before all of this work, I studied 800 hours to pass the Probability SOA exam and roughly 300 for their Financial Mathematics exam, and I already had a degree in Statistics! In large swaths of job postings, Master's degrees are still required (likely just nice-to-haves in some instances). The main takeaway here is that a boot camp cannot possibly replace years of study and on-the-job experience.
However, surveys from job boards like Indeed report that 84% of employers find boot camp graduates are as prepared or more so than candidates with undergrad degrees. That's a huge number, and a great sign for boot camps. It might also be somewhat of a complaint against the disconnect from universities and the workforce.
This is primarily for those who are considering dipping into this field for the first time. Before you take that dip, throwing down thousands of dollars and just as many hours, consider if you want all of the baggage that comes with the gig. Last decade, Harvard Business Review said the "data scientist is the sexiest job of the 21st century". The reasons included the large job growth, the median income of roughly $150k (via the StackOverflow data set used in previous posts) or $113k via Glass Door, and the high job satisfaction driven by things like their persuasive influence on executives and ability to drive to decisions. Honestly, there are some days I can't fathom that this is what I do. I'm incredibly joyful. But here are some realities that need to be battled before you get to the boss fight.
There is a growing number of data scientists who are leaving the field. Well, why the hell would they do that?! In the linked article, I don't think I can do much better than that. But, you face a few battles everyday:
Expectations of a data scientist are buck wild and some think:
You only do meandering, menial tasks in Excel.
You build an entire data warehouse, figure out how to collect data, and then do the necessary things to collect it.
You can solve all data problems: Excel tables, reports in Tableau, building production models in Python, writing surveys in Qualtrics, building sweet new algos too hot to handle, and running complicated, academic-level simulations.
Learning and challenges eventually dry up:
After years of build up, you start to maintain models that took you years to develop. The company is fine with using these forever, just so long as you oil them up a bit over time. You lose your mind and become ethereal.
There's no opportunity to work on new projects or develop new products at your place of work. It's worse if there is no ability to contribute across teams.
The regular day-to-day activities are hard:
It's true. You'll spend the vast majority of your time cleaning data in most cases. This is not easy. Data might come from dozens of different places, so you may need to write algorithms to put them into one format. You may work with dirty text that needs one true spelling (how many ways can we misspell Orville Redenbacher?). Some fields might need to translated, and you'll have no help figuring out that the word for Lenovo's company in China is "association" in Chinese. There will be many times you'll have to defend your choices, with no precisely right answer. Do you remove this missing data? Do you impute it? Why did you do that? Should you collapse counties into states or into regions? What effect does that have? How dare you!
If not cleaning data, it will be tinkering with slight variations of things, often with no guidelines. You'll have to defend your choices here as well. There are many methods of cross-validation to choose from, many hyperparameters to try for a model, superstitious quantities of models to choose from. And someone will end up asking you, "is this a regression?" "No, Bob. These are the unexplainable results of a multinomial-outcome gradient boosted machine optimized across 15,000 hyperparameter combinations using repeated 10-fold cross-validation with a train, validation, and test set imputed with CART. No, this is not a regression."
Many people in the data world face political battles across teams. You want to go help the mechanics improve their profitability? Well, this might have them in a lose-lose if you get their data. Maybe they've been slow-going recently. Maybe they don't want you to see that they've had to add discounts just to get customers in the door. Maybe it feels like an attack that some outsider will tell them what to do. Maybe if you find they are performing well, someone will push them to get to the next level, adding another heap of stress on this team. There are many reasons other teams can be uncooperative.
Sometimes I have to boil down variable importance to a CMO or explain to an exec some mess of set theory. You'll eventually be challenged to explain these things, and it can often be part of your process that you glance over or don't at all understand yourself.
You will need to keep learning for the rest of your life:
I've had to learn Python, Databricks, Bayesian modeling in JAGS, Power BI, LIME models, propensity score matching, Git, and Azure cloud services in just the last 4 years. None of these were things I had to do when I started, but they are/were important for some project to progress.
For computer vision, there are new algorithms or network stacks that are "state of the art" every time I check. CNN, ResNet, YOLO, U-Net, Siamese network...it just kept going! I also need to relearn some of these every time I use them.
Even familiar packages change now and again. Small changes in the dplyr package broke many scripts and others no longer existed. It can be hard to remember how different versions work once you've learned one.
So ask yourself: are you down to clown?
Third: Two Moves from Now
Here comes the strategy talk. Do you want to write algorithms that others will use across the planet? Are you hoping to become the next CIO or CDO? Do you want to take part in famous labs or lead the best data teams in the country? Well, you might want to just start the long road to a PhD. Very many of the top positions require one, or they require an equivalent body of research. It's fully possible to get there without one, but this path is more straightforward.
Are you looking to lead a team of data scientists? Do you want to fast-track to that role? Maybe a Master's degree is better. Again, there is an education block instituted by HR. I don't think a Master's is truly a stronger indicator than those who complete boot camps among data scientists. Work experience and proof in the form of projects and portfolios are the growing preference. But, given the greenest candidate possible, HR departments still lean towards formal education and there are often so many applicants that education is just another way to filter down to a manageable number of resumes for a team to read. There are other pluses to enrolling in Master's program. There is a heavy emphasis on math and formalities that don't seem to be drilled anywhere else. I do think knowing the assumptions of linear regression and how backpropagation works in calculus are important, but they only seem to really matter in creative or leading roles. So much work is applied and uses renowned packages. Occasionally, pressure forces teams to bend the rules of math and just accept whatever works for them. The necessity of this knowledge fluctuates across the field, but you can 100% get away without being able to explain why your query plan chose a scan instead of a seek. It's easier to get this info from a formal education, but my Master's degree still spent too much time on breadth. At least early in your career, a rigorous boot camp will prepare you faster for the technical aspects of working.
Fourth: The Learning Environment
All fields face this question. People learn differently and at different paces. Luckily, data science is still open enough to include those who take non-traditional paths. I 100% believe that higher education is not necessary to excel as an analyst. There are now multiple, acceptable ways of getting prepared. The industry you work in may require certifications or another way to prove your competence, but it's unnecessary to have a data science degree paired with an MBA or PhD in Public Policy. These degrees prepare you for solving the problems within the industry, but data science focuses on the tools and algorithms foremost.
For those who are uber disciplined, self-education through textbooks, edEx, Coursera, and DataCamp is starting to become a reasonable route. However, these lack real tangible products for folks to judge experience and skill level. These will not work by themselves. In fact, the general consensus is that the credentials provided by MOOCs are nearly worthless. They're often too simplistic, lack real-world preparation, are easy to fake through, and have few assets to show off at the end. This route is only fruitful with impressive projects that include cleaning data, showing deep thought about a problem, demonstrate the iterative process of analysis, and are convincing. The closer one can simulate a working analyst, the more effective that candidate becomes. Needless to say, this is a lot to take on unguided. It's still fairly unconventional, but it does work.
If self-learning in your isolated monastery, punishing yourself daily, sounds like masochism, then a boot camp is the next best option in terms of price. Master's programs will run you $30k+ and take a minimum of 1 year. Most take 1.5-3. Many used to be in-person only, too, meaning you would have to move to that institution. The costs only rack up from there if you can't keep an income up to a low boil and have to finance a hut to live in. Boot camps have much lower price tags. Springboard's Data Analytics program is online, mentor-guided, and costs $6.6k. It is less rigorous than many on-site boot camps, but those require cohorts to meet nearly every day for 8+ hours. Especially for folks changing careers, it's very hard to take 3 months off to make a boot camp your full-time job. I've read a handful of posts from TA's who don't think the curriculum matches the costs. It's definitely a premium to pay, but it returns the luxury of guidance and support. Working with a cohort together is also such a joy. There's nothing better than hammering through the hyperparameters of a predictive model with a team trying to conquer it with you.
The layout has been presented. Many of these arguments are things potential boot camp students have already considered. Keep in mind that the boot camp ends up just being a line on your resume. The candidate's ability to contribute is what people want to see, and a pile of unimpressive projects have forced many employers to write off boot camp students. Make sure that the challenge is aggressive. Build a portfolio that pushes skill and knowledge growth. Get feedback on your work from professionals. Certifications of completion really don't mean anything. There are too many people with great credentials in the same competition. It's a mistake to think that classes or boot camps are enough by themselves. There are too many budding analysts to be looking like a bozo.