How Data Science turned perspiration into Inspiration - My journey from ignorance to wisdom

Sumana Bhlapibul

Sumana Bhlapibul

Lead IT/Data Consultant

"It is nice to be important but it is more important to be nice." ~John Templeton

  • This email address is being protected from spambots. You need JavaScript enabled to view it.

In March 2018, a friend who worked at Microsoft sent me information about “Azure Academy for Women – Data Science”. Though I did not intend to get back in the job market, I follow technology, especially those that improve quality of life. I did not recall stumbling across the term ‘Data Science’. 

I wanted to show to the world that women are very capable.
-- Sumana Bhlapibul

I was enjoying my life with as little to do with the computer as possible. I was dreading the thought of having to sit long hours in front of one. But I wanted to broaden my horizons. And I wanted to show to the world that women are very capable. So, I decided to enroll in the program.   

During the program, it felt like I was shot to the moon with those new tools (nerd toys) and capabilities. Some tasks that I would spend days developing code for were now possible with a few lines of code or drag and drop a few times.  

Before I also shoot you up in space, let me first share what I discovered on my Data Science Journey.  

"Data Science" Program

The program is constructed in the way that you will be equipped with basic skills to tell a story with data. I was exposed to and learned hands on: 

  1. Methodology to gather/collect, query/extract, transform/restructure, interpret/analyze and visualize Data 
  2. Using Tools like Excel, PowerBI, Jupyter notebook, Python, R 
  3. Process to build predictive solution with Machine Learning in Azure 

By the way, I did not have the “Ethics and Law in Data Analytics” and “Analytics Story Telling for Impact” in our curriculum.  I can see why Microsoft added them later. They are highly relevant.  They resonate with me.  I recalled reading a case where a person was sent to jail based on probability of A given B but later was acquitted based on the probability of B given A.  Being a subject matter expert could put one in a powerful position.   Spiderman's "With great power comes great responsibility" quote flashed through my mind.

The process that I went through, in order to be able to learn to tell stories with data, can be divided into four parts. They are: 

  1. The first rule of engagement: “go get the data”. 
    I was exposed to hands-on labs that required me to query data from a relational database using T-SQL or extract data from spreadsheet or CSV format by using tools like Excel, PowerBI, Jupyter notebook, Python, R 
  2. Second step: “data being there does not necessarily mean it is trustworthy, usable or even relevant.”   
    In order to “make data trustworthy and usable”, I had to determine if the data was sufficient e.g. it had enough data points or was of a good quality (i.e. not just precise but must be accurate). I worked through the process of how to deal with missing values, duplicate entry and outliers. I used descriptive statistics e.g. sum, average, standard deviation, skewness, distribution to get to know the characteristics of the data. 
  3. Third step: “show what you want to tell using data and charts” 
    I produced all kind of graphs e.g. pie, bar, histogram, line, box plot, scatter chart using tools I used to extract data in from part a. Up until this step, I was within the “basic data analysis” scope.  
  4. Fourth step: “use data to build a predictive solution” 
    It is here that I stepped into “advanced data analysis” where I built predictive solutions with Machine Learning in Azure. 

The Learning Process

How I went through this program was no less interesting. It is an intensive self-study with many hours of listening.  The study must be done in 3 months which included additional reading (and Google search) to help me get through hands on labs and graded quiz/exam. After 3 months with all exams passed (minimum score of 70% required), I would be entitled to take the grand data science exam, the “Capstone”. So, time was of the essence. 
  

Note - I use the term “self-study”. I did not use the term “self-learning”. I truly mean that because as the expression goes "what you remember makes you learn". 

You can argue that today, one can Google, so one does not need to remember anything.  I did a lot of Googling too. I don’t remember all the code construction, all the terminology – the WHAT. But I made it my top priority to learn “HOW TO” i.e. the process steps. 


During the program, when my learning stopped, it was for one of the following reasons: 

  1. I didn't truly understand the meaning of the key word (English) in the statement. I looked up the meaning in on line dictionary and I could move on. 
  2. I didn't understand the technical team (jargon). This was the most frustrating one. Nothing seemed to stick after this. I put them in a parking lot in my (paper) notebook and moved on. Many times, the answers were in the hands-on lab or in another course. 
  3. I didn't like the subject and I didn't see why I should be doing this. This one required a change of mindset. I told myself that I don’t need to like it but at least I should find out why I don’t like it to be able to tell why I don’t like it. When I know the reason(s), it is easy to use logic to bring myself into "good to know" learning mode and moved on. 
  4. I didn't know the subject and I didn't want to spend a lot of time try to understand the subject to earn a point in graded quiz/exam. I accepted the situation and moved on. An example of this situation was probability that related to playing craps. I have never played craps. I don’t know the rules of the game. To earn one extra point, I would have to invest many hours study craps. What a crap. 

 

Then it was time for the Capstone. The “Capstone” was the ultimate test to see whether I had learned something or not.  I had to build a machine learning model that predicted the rate of heart disease (per 100,000 individuals) across the United States at the county-level based on socioeconomic indicators. My model had to be able to predict with a certain degree of accuracy. I had to submit a report describing data description, data exploration, individual feature statistics, data cleaning, machine learning algorithm I chose, the tool I used, conclusion, recommendation and most importantly, the "Executive Summary". The report was reviewed by other 3 capstone participants. I also reviewed reports from 3 others from around the world. 

 The fun part of the capstone was the accuracy check. I uploaded the prediction to Microsoft Professional Program Capstone Challenge website. I got the feedback of how good my model was by the “RMSE” (root mean square error) score. I could also see my rank among all participants.  I had 3 submissions per day, and I used all of them every day to check if my model got better from those adjustments I made. It was not only to get a passing accuracy score. It was a competition to be the best I could be in the in the competition. I must say, the leader board was an excellent strategy to motivate people to get to a better model. 

 I started the journey in April and in July, I was a proud achiever of the Microsoft Professional Data Science certification.   

A lawyer friend asked what did I get out of this program?  After some thought, I replied; 

“I can apply data science to create business insight. Which I can advance to develop intelligent solutions.”     


My friend tilted her head a bit while saying “I like the part of business insight. But you have got to explain more about the intelligent solutions”. 


That sounded like a fair request.  Though my right brain said quietly “To tell a technical story to a lawyer would require more than some thought”. My left brain said out loud: “facebook tag, autonomous car, Amazon recommender".  Surprisingly myself that I knew this so quickly, I proudly added “These are intelligent solutions built using machine learning” with a big smile from ear to ear! 


Print   Email