Sumana BhlapibulLead IT/Data Consultant
"It is nice to be important but it is more important to be nice." ~John Templeton
In March 2018, a friend who worked at Microsoft sent me information about “Azure Academy for Women – Data Science”. Though I did not intend to get back in the job market, I follow technology, especially those that improve quality of life. I did not recall stumbling across the term ‘Data Science’.
I wanted to show to the world that women are very capable.
I was enjoying my life with as little to do with the computer as possible. I was dreading the thought of having to sit long hours in front of one. But I wanted to broaden my horizons. And I wanted to show to the world that women are very capable. So, I decided to enroll in the program.
During the program, it felt like I was shot to the moon with those new tools (nerd toys) and capabilities. Some tasks that I would spend days developing code for were now possible with a few lines of code or drag and drop a few times.
Before I also shoot you up in space, let me first share what I discovered on my Data Science Journey.
"Data Science" Program
The program is constructed in the way that you will be equipped with basic skills to tell a story with data. I was exposed to and learned hands on:
- Methodology to gather/collect, query/extract, transform/restructure, interpret/analyze and visualize Data
- Using Tools like Excel, PowerBI, Jupyter notebook, Python, R
- Process to build predictive solution with Machine Learning in Azure
By the way, I did not have the “Ethics and Law in Data Analytics” and “Analytics Story Telling for Impact” in our curriculum. I can see why Microsoft added them later. They are highly relevant. They resonate with me. I recalled reading a case where a person was sent to jail based on probability of A given B but later was acquitted based on the probability of B given A. Being a subject matter expert could put one in a powerful position. Spiderman's "With great power comes great responsibility" quote flashed through my mind.
The process that I went through, in order to be able to learn to tell stories with data, can be divided into four parts. They are:
- The first rule of engagement: “go get the data”.
I was exposed to hands-on labs that required me to query data from a relational database using T-SQL or extract data from spreadsheet or CSV format by using tools like Excel, PowerBI, Jupyter notebook, Python, R
- Second step: “data being there does not necessarily mean it is trustworthy, usable or even relevant.”
In order to “make data trustworthy and usable”, I had to determine if the data was sufficient e.g. it had enough data points or was of a good quality (i.e. not just precise but must be accurate). I worked through the process of how to deal with missing values, duplicate entry and outliers. I used descriptive statistics e.g. sum, average, standard deviation, skewness, distribution to get to know the characteristics of the data.
- Third step: “show what you want to tell using data and charts”
I produced all kind of graphs e.g. pie, bar, histogram, line, box plot, scatter chart using tools I used to extract data in from part a. Up until this step, I was within the “basic data analysis” scope.
- Fourth step: “use data to build a predictive solution”
It is here that I stepped into “advanced data analysis” where I built predictive solutions with Machine Learning in Azure.
The Learning Process
How I went through this program was no less interesting. It is an intensive self-study with many hours of listening. The study must be done in 3 months which included additional reading (and Google search) to help me get through hands on labs and graded quiz/exam. After 3 months with all exams passed (minimum score of 70% required), I would be entitled to take the grand data science exam, the “Capstone”. So, time was of the essence.
Note - I use the term “self-study”. I did not use the term “self-learning”. I truly mean that because as the expression goes "what you remember makes you learn".
You can argue that today, one can Google, so one does not need to remember anything. I did a lot of Googling too. I don’t remember all the code construction, all the terminology – the WHAT. But I made it my top priority to learn “HOW TO” i.e. the process steps.
During the program, when my learning stopped, it was for one of the following reasons:
- I didn't truly understand the meaning of the key word (English) in the statement. I looked up the meaning in on line dictionary and I could move on.
- I didn't understand the technical team (jargon). This was the most frustrating one. Nothing seemed to stick after this. I put them in a parking lot in my (paper) notebook and moved on. Many times, the answers were in the hands-on lab or in another course.
- I didn't like the subject and I didn't see why I should be doing this. This one required a change of mindset. I told myself that I don’t need to like it but at least I should find out why I don’t like it to be able to tell why I don’t like it. When I know the reason(s), it is easy to use logic to bring myself into "good to know" learning mode and moved on.
- I didn't know the subject and I didn't want to spend a lot of time try to understand the subject to earn a point in graded quiz/exam. I accepted the situation and moved on. An example of this situation was probability that related to playing craps. I have never played craps. I don’t know the rules of the game. To earn one extra point, I would have to invest many hours study craps. What a crap.
Then it was time for the Capstone. The “Capstone” was the ultimate test to see whether I had learned something or not. I had to build a machine learning model that predicted the rate of heart disease (per 100,000 individuals) across the United States at the county-level based on socioeconomic indicators. My model had to be able to predict with a certain degree of accuracy. I had to submit a report describing data description, data exploration, individual feature statistics, data cleaning, machine learning algorithm I chose, the tool I used, conclusion, recommendation and most importantly, the "Executive Summary". The report was reviewed by other 3 capstone participants. I also reviewed reports from 3 others from around the world.
The fun part of the capstone was the accuracy check. I uploaded the prediction to Microsoft Professional Program Capstone Challenge website. I got the feedback of how good my model was by the “RMSE” (root mean square error) score. I could also see my rank among all participants. I had 3 submissions per day, and I used all of them every day to check if my model got better from those adjustments I made. It was not only to get a passing accuracy score. It was a competition to be the best I could be in the in the competition. I must say, the leader board was an excellent strategy to motivate people to get to a better model.
I started the journey in April and in July, I was a proud achiever of the Microsoft Professional Data Science certification.
A lawyer friend asked what did I get out of this program? After some thought, I replied;
“I can apply data science to create business insight. Which I can advance to develop intelligent solutions.”
My friend tilted her head a bit while saying “I like the part of business insight. But you have got to explain more about the intelligent solutions”.
That sounded like a fair request. Though my right brain said quietly “To tell a technical story to a lawyer would require more than some thought”. My left brain said out loud: “facebook tag, autonomous car, Amazon recommender". Surprisingly myself that I knew this so quickly, I proudly added “These are intelligent solutions built using machine learning” with a big smile from ear to ear!