Image for post
Image for post
Photo by Jordan McDonald on Unsplash

What is an ensemble method? Construct a set of independent models and predict class labels by combining their predictions made by multiple models. This strategic combination can reduce the total error, including decrease variance (bagging) and bias (boosting), or improve the performance of a single model (stacking).

Image for post
Image for post
Photo by Robin Glauser on Unsplash

What is Decision Tree? A supervised learning method that uses a tree-like model for classification and regression. It helps to find the relationship between a large number of candidate input variables and a target variable. However, it is a greedy algorithm that does not produce an optimal decision tree that minimizes the error.

Decision tree uses variables with if-then-else decision rule to divide up a large collection of records into successively smaller sets of records. The goal is to explore the train data and build a model by cleanly splitting a large heterogeneous population into smaller and more homogeneous groups…

Image for post
Image for post

We often ask multiple questions to find insightful inputs when measuring fuzzy concepts such as “‘service quality”, “consumer trust” or “customer loyalty”. However, there are too many variables, including those that are unimportant or unrelated, that cause dimensionality problem. Therefore, data reduction is a necessary way in this kind of marketing research. It can be divided into two parts: feature selection and feature extraction.

Feature selection and feature extraction reduce the number of variables by obtaining a set of principal variables. The algorithm behind them helps us choose the relevant and significant variables automatically. …

Image for post
Image for post
Photo by Capturing the human heart. on Unsplash

In the previous article “Feature Extraction Using Factor Analysis in R”, we mentioned that besides factor analysis, principal component analysis is also a common way to reduce the dimensionality. So here, I’m going to use the same data to introduce PCA, and also show its result on the perceptual map.

What is PCA?

An unsupervised learning mathematical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of linearly uncorrelated principal components.

Since that PCA calculates a new projection of the data set and the new axis is based on the standard deviation of…

Image for post
Image for post
Photo by Bill Oxford on Unsplash

What is Feature Extraction? A process to reduce the number of features in a dataset by creating new features from the existing ones. The new reduced subset is able to summarize most of the information contained in the original set of features.

There are two methods in feature extraction: factor analysis and principal component analysis. I’ll first talk about factor analysis in this post.

To eliminate the correlation between a large number of variables, we use factor analysis to find the root factors that represent their dependent ones. As we simplify the data, we also want to retain as much…

Image for post
Image for post
Photo by Edu Grande on Unsplash

What is Feature Selection? A process to filter irrelevant or redundant features from the dataset. By doing this, we can reduce the complexity of a model, make it easier to interpret, and also improve the accuracy if the right subset is chosen.

In this post, I will first focus on the demonstration of feature selection using wrapper methods by using R. Here, I use the “Discover Card Satisfaction Study ” data as an example.

cardData = read.csv(“Discover_step.csv”, head=TRUE)

Image for post
Image for post
Photo by Luca Bravo on Unsplash

After data extraction, we usually need to merge those files together for further analysis. It can be several files, but it can also be hundreds of files. It’s easy to copy and paste the data we need in one sheet with just a few files. However, when dealing with a large number of files, there’s no point in doing it manually. So here I’m going to share a three-step way I used to put the data together using Python.

1. Import Library

import pandas as pd
import os

To merge the files, we need to use two modules, pandas for reading the CSV…

  Especially thank my thoughtful friend for sharing his study guide
Image for post
Image for post
Photo by Sarah Noltner on Unsplash

SCM 516: Applied Analytics

Time Series Decomposition

  • Chronologically ordered data are referred to as a time series
  • A time series may contain one or many elements — trend, seasonality, cyclical pattern, autocorrelation, and random variation
  • Identifying these elements and separating the time series data into these components is known as decomposition
  • Exponential smoothing — more recent records are given more weight
  • Holt’s model — includes a parameter that is an estimate of the change in the series from one period to the next
  • Winter’s model — Includes a parameter that…


Image for post
Image for post

— 周遭環境


— 項目介紹


分組: 我們的學制是用quarter來分(相當於半個semester),兩個quarter分一次組,通常四人一組,固定會有一個美國人和印度人,但當然也有例外。

作業: 小組作業和個人作業的比例大概40/60,小組的要做較複雜的case study和project,個人的是測驗和coding為主 。

考試: 隨堂測驗其實滿多的,最少兩周一次(都不會到太難),也有回家open book的考試,另外再加上期中或期末。

上課: 課堂上老師很常問問題,同學也很會發言,而且因為小班制所以有滿多的討論和互動,我覺得這對於國際生來說是很不錯的部分,訓練思考和口語表達能力。

軟體: 需要用到Python, R, SQL, Tableau, Excel (Precision Tree, Solvers, at Risk),作業環境會建議用Windows比較方便。

資源: department有提供mentor和advisor (academic & career),所以可以自己寫信跟他們約時間諮詢,他們也會不定時寄一些生活或工作的資訊。系上有study room提供小組討論或個人學習,另外還有一間24小時的自習室。

— 課程規劃


第三和第四個quarter是loading最重的階段,除了coding skill和專業知識之外,重點就是執行capstone project (有點像是建教合作那種internship),不過也因為課少的關係,相對有更多時間可以運用。

— 技術學習

35% Supervised Machine Learning (classification and regression: decision tree, random forest, neural, support vector machines, network, k nearest neighbor, Xgboost)

20% Database Management (enterprise analytics, database, SQL)

20% Marketing Analytics (multi-linear regression, logistic regression, factor analysis, profit analysis, BASS model, CLV)

10% Data Visualization (Tableau dashboard, Excel graph)

10% Recommendation System (collaborative and content-based filtering)

5% Unsupervised Machine Learning (clustering: K-means, text mining)

— 未來就業

商業分析師(Business Analyst)、資料分析師(Data Analyst),大部分學生畢業後求職以這兩者為主,工作內容偏向data cleaning, interpreting並提供公司或客戶商業上的決策建議。

專案經理(Program Manager),越來越多人考慮這個方向,這是一個要會溝通也要會技術(基礎架構要懂)的領導職位,統合兩方的需求並下決策。

資料科學家(Data Scientist),也有人往這邊走,不過要具備的技術能力更強,要是只是單單讀BA而沒有自我鍛鍊(改algorithm、建model),一般是勝任不了的。


— 心得感想



商業分析的面向很廣,現在幾乎每個產業都會需要這方面人才,然而在美國的市場也逐漸趨於飽和狀態,所以最後我會建議最好要找到某個自己喜歡的領域(marketing, healthcare, logistic, customer service, education, etc.)深入下去,結合大學所學也是一個好方法,這樣之後找工作會比較有頭緒,也有可能比較突出、比較吃香,另外當然就是多多練習SQL和Python (尤其是SQL),當你開始找工作和面試後,會發現這些是必備的基本款。

謝謝你願意閱讀到最後,之後我還會分享課程相關的內容、我學習後的運用、找工作的歷程方法、還有我的觀察想法等等。如果有興趣的話歡迎追蹤跟分享,有問題的話也可以email我: 或到我的instagram上留言給我: sct.k

Image for post
Image for post

As a data journalist, a storyteller, or even a data analyst, teaching yourself Python, R, Tableau or Power BI could be challenging in the beginning. However, there are some open sources online which have already prepared the models, structures, and ideas for those who are new in this area to step in.

In this post, I will cover 6 helpful methods for presenting the data, in the order of how cool, how amazing I think they are:

1. Interactive Q & A

Kelly Szutu

Journalist x Data Visualization | Data Analyst x Machine Learning | Python, SQL, Tableau | LinkedIn:

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store