Fall 2022 Issue

Girl looking in distance analyzing thoughts around big data numbers

Photo credit: Adobe Stock


When Somyi Baek ’13 immigrated to the United States from South Korea, she never imagined she would pursue a career in STEM. She saw herself pursuing a degree in linguistics or hospitality. It was after she arrived, she realized she had more advanced mathematics skills than her peers.

Somyi Baek '13
Somyi Baek '13

“I came to America when I was 16 and quickly realized I was ahead of everyone else in mathematics because of my education in South Korea. Everyone thought I was a math genius because I was good at trigonometry and factoring polynomials. It made me feel good about math and everyone was so encouraging,” Baek said affectionately.

This positive experience motivated Baek to continue her mathematics studies. She went on to take AP calculus during her junior year in high school, and things started to come together for her.

“All the mathematical subjects that I had previously learned had come together in the big picture with calculus - everything was useful. I began to understand why I learned these other tools up until this point,” she explained. “It was fascinating that calculus is used everywhere – such as engineering – and the different pieces of mathematics came together to form a very useful tool. I was intrigued by the possibility of finding out if there were any more amazing tools…this led me to pursue mathematics in college.”

While deciding on her path to college, Baek was introduced to Stockton through her high school mathematics teacher who was an alumnus.

“My math teacher Eric Chancellor went to Stockton and talked about his time there. He wrote me an excellent letter of recommendation and Stockton was generous enough to offer me a full ride. I knew I was making the right decision,” she explained. 

While pursuing her mathematics degree, Baek worked on many independent studies. She fondly recalled working on a project with Dr. Wondi Geremew, associate professor of computer information systems.

“I worked as a math tutor and got to know Dr. Geremew because he had an office in the Tutoring Center. I learned his specialty was the Simplex Method and Linear Programming, and I was interested in the topic [Simplex Method or Simplex Algorithm is used for calculating the optimal solution to the linear programming problem. In other words, the simplex algorithm is an iterative procedure carried out systematically to determine the optimal solution from the set of feasible solutions. – Businessjargons.com],” Baek recalled. “ . . .  It’s one of the topics that's very mathematical but also very applied. Not a lot of people can say that they've studied it because it is a specialized topic. It gives me an edge to have it on my resume.”

Data science lifecycle
Data science lifecycle. Photo credit: EDUCA

After graduating Stockton, Baek attended the University of Minnesota with the intention of becoming a mathematics professor but began considering her options during her third or fourth year of graduate school. She went on to explain, “I was weighing two different options for a long time. I realized I loved statistics … and liked coding. All the required skill sets for data science felt fun to me.” 

“Data science practitioners apply machine learning algorithms to numbers, text, images, video, audio, and more to produce artificial intelligence (AI) systems to perform tasks that ordinarily require human intelligence. In turn, these systems generate insights which analysts and business users can translate into tangible business value” (DataRobot, n.d.). 

Baek shifted direction with her studies and took a data analytics intern position for a company that provides hygiene solutions for different industries around the world. Her first assignment was a complex project that required her to analyze data and predict a contamination that had never occurred before on a particular commercial farm. This analysis proved to be quite difficult as the data classification was imbalanced [data classification refers to the process that tags and categorizes any kind of data so that it can be better understood and analyzed. In this data set, yes = contamination, and no = clean product].

Commercial farmers test their product to ensure it is safe to be consumed. Most of the time, the test results are clean. And in this case, this particular farm never had a contamination event, however, they wanted to predict when they would.


“It sounds impossible, but if your data doesn’t show a contamination event, how are we going to predict the next event when it has never happened before,” Baek inflected. 

Class balance – the yes and no instances to be equalized – is important in data science so you can learn patterns. In this case, the data set had a severe class imbalance because the data only showed no contamination. 

“Because this is a special case of classification, it needed a very specialized tool and model. I dug deep into [research] papers and found a tool developed by social scientists to study rare diseases. The tool is used in the medical field to estimate the probability of someone having a rare disease, even though the number of that population is very small,” she said. Baek continued to explain that her project" . . . had a very small data set which is never good in statistics. You always want more data that gives your model more predictive power. I had to figure out a workaround. This project gave me a good sense of what data sets are like in real life.”

This experience came into good use. It wasn’t long before Baek defended her thesis in the midst of the pandemic in July 2020.  Although data science is a booming field, there was an unprecedented hiring freeze brought on by the virus. However, by August 2020, she obtained a position as a data scientist at U.S. Bank, the fifth-largest bank in America. Most recently, Baek became a lead data scientist for the sustainability team at Target this past July. 

Data science is a rapidly evolving field that requires master's level statistics and lifelong learning to stay up to date as technology evolves. 

If you are interested in pursuing Big Data, you don’t have to go far! Learn more about Stockton’s Data Science & Strategic Analytics Master’s program. Direct entry is available for students who meet the requirements and does not require a GRE exam.

1 Maria. (2022, May 17). Data science. DataRobot AI Cloud. Retrieved July 12, 2022, from https://www.datarobot.com/wiki/data-science/


Learn from an Expert!

Hear more from Somyi about data science and how you could pursue a career in the field.

 

Most data scientists come equipped with excellent mathematical thinking, coding skills, and a good statistical background. These are the basic skill sets required. What will make you stand out is being a good communicator. First and foremost, you’re a consultant solving business problems. It's important to be a good translator of math and statistics and understand the business's worth. Knowing when to drop the technical lingo, bring it in and make it very rigorous is really important.

Being a math tutor was helpful for me because it made me think about how I should explain a technical math problem/concept to someone that doesn't understand it. I had gotten a lot of practice by putting myself in the other person shoes as both a tutor and as a teaching assistant in grad school.

If you're interested in data science, try your skills at one of the online competitions at kaggle.com. Kaggle holds many different online competitions. Many prominent companies, such as Google, have online competitions on the site. The competition gives you the opportunity to work on data science problems – from friendly beginner problems to more advanced – and get ranked against other people. 

All data scientists also use Stackexchange, which is an online forum where you can ask questions and get help on different languages, such as Python or SQL questions. 

If you are trying to get better at coding, Leetcode.com is a good resource. It has different levels of difficulty, and a lot of tech companies use this platform to screen candidates. 

I also like to follow certain female data scientists in South Korea. They write about their experience in different startups and what they did to get their resume "job search ready." I like reading their blogs because I could see myself in them. However, there are a lot of people blogging about their experiences in data science on medium.com and other popular blogging platforms.

Code every day. It's hard to be uncomfortable with coding and be a data scientist. You should be able to do medium-level Leetcode comfortably – have a sense of how to write a big idea of the code. 

Python and R are high-level languages [high-level category in coding means it is more user-friendly] that data scientists use. If you're comfortable with either one, you can pick up the other very easily.  If you are interested in becoming a data scientist, it's important that you learn how to code in Python or R.

You will need at least master’s level statistics knowledge to become a data scientist; however, it is very common to see Ph.D.s in the field.

While you are an undergraduate, you can dabble in statistics and computer science courses to see if you like them. You will also need linear algebra and calculus.

It doesn’t matter what discipline your undergraduate degree is in, as long as you've acquired the skill sets. Most data scientists will have a master's degree in data science or computer science. However, there are individuals who hold a Ph.D., but it is usually in another discipline of STEM (physics, mathematics, etc.). An advanced degree lends itself well to acquiring the skill set data scientists need.

It’s an exciting field, and I hope to encourage women in STEM. I think that data science as a field seems very techy, and you see more men than women. I’m hoping female students will be encouraged to see a female with a Ph.D. in mathematics and pursue it. This field is not going away; it will only get more established over time.