**Syllabus of **

Techniques of Data Mining for Transportation

**I****nstructor: **Professor Shuyan Chen, Transportation College, Southeast University

**No. ****S021178 ****Credits**: 1.5

**Course Description:**

The Motivation for this course started with the development of information techniques. The amount of traffic data collected is growing at an increasing rate. At the same time, the users of these data are expecting more sophisticated analysis of these large data sets. The area of data mining has developed over the last decade to address this problem.

Data Mining is often defined as discovering useful but hidden patterns or relationships in a database, which is one of the hottest fields in Computer Science. Finding patterns, trends, and outliers in these datasets, and summarizing them with simple quantitative models, is one of the grand challenges of the information age—turning data into knowledge.

Data mining programs are intended to search through data for hidden relationships and patterns in the datasets. This approach is particularly relative to intelligent transportation system. It can be very helpful for traffic researchers and managers to solve traffic problems. So, data mining is a good field to study not only for computer science students, but also for transportation students, because the same techniques can be used to solve many traffic problems that may arise during their career in the future

This course provides an introduction to data mining as applied to transportation systems. It intends to cover the basic concepts of data mining as well as specific applications to Transportation systems.

**Prerequisite**:

Knowledge of probability, statistics and linear algebra at the undergraduate level; Basic knowledge of traffic engineering, and basic programing skills.

**Textbook: **

Jiawei Han, Micheline Kamber and Jian Pei, Data Mining: Concepts and Techniques, Morgan Kaufmann, 3rd edition, 2011.

**Reference books:**

Ian H.Witten, Eibe Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, San Francisco: Morgan Kaufmann Publishers, 3rd ed. 2011.

Charu C. Aggarwal, Data Mining: The Textbook, Springer, May 2015.

Pang-Ning Tan, Michael Steinbach, Vipin Kumar, Introduction to Data Mining, Pearson, 1st Edition, 2005.

Christopher M. Bishop, Pattern recognition and machine learning, the Morgan Kaufmann series in information science and statistics, Springer Science, 2006.

Required handouts will be provided by the instructor.

**Course Objectives:**

The objectives of the course are to present the basic concepts of data mining, the principles and ideas underlying the practice of data mining, including decision tree, Support Vector Machine, Neural Network, ensemble learning, and instance learning. The instructors will introduce what the techniques are, what they can do, how they are used, and how they work?

Specific theoretical and mathematical topics will be kept to a minimum in order to focus on applications of these techniques to transportation.

After completing this course, students will have the ability to understand the fundamental terms and concepts of data mining, and to use the methods taught in class for the analysis and processing of real transportation data.

**T****entative ****C****ourse ****O****utline:**

**Chapter 1. Introduction to data mining**

1.1 Motivation: Why data mining?

1.2 What is data mining?

1.3 Data Mining: On what kind of data?

1.4 Data mining functionality

1.5 Classification of data mining systems

1.6 Origin of Data Mining

1.7 Data Mining Techniques

**Chapter 2. Data pre-processing**

2.1 Why preprocess the data?

2.2 Data cleaning

2.3 Data integration and transformation

2.4 Data reduction

2.5 Discretization and concept hierarchy generation

2.6 Summary

**Chapter 3. Outlier mining algorithms**

Background of Outlier Detection

statistic-based method

distance-based method

density-based method

Case Study

Conclusions and Further Work

**Chapter 4. ****Decision trees**

4.1 Classification and Regression

4.2 Construct Decision Tree

4.3 Overfitting

4.4 Extensions

4.5 Application to Incident Detection

**Chapter 5. Support vector machine**

5.1 Linear SVMs

5.2 Non-linear SVMs

5.3 Multiclass

5.4 Support vector regression

5.5 Summary

**Chapter 6. Neural networks **

6.1 What are Neural Networks?

6.2 What can a Neural Network do?

6.3 Basic Concepts of NN Learning

6.4 BP Neural Network

6.5 RBF Neural Network

6.6 Comparison between RBFNN and BPNN

**Chapter 7. Ensemble leaning**

7.1 General Idea of Ensemble Methods

7.2 Generate Members

7.3 Combine Schemes

7.4 Application

**Chapter 8. Instance based learning**

8.1 Overview

8.2 K-Nearest Neighbor

8.3 Distance Measures

8.4 Attribute Weight

8.5 Variant of kNN: Distance-Weighted kNN

Classes will be in a combination of lecture and discussion. Students are expected to participate actively in class discussions. There will be reading assigned for each class and students are expected to be prepared to answer questions.

This course also requires the student to do exercises, complete a group project and pass the final exam.

**Exercises: **There will be six exercises provided to the students, which correspond to classroom teaching. It will take 2 classes to finish each one. Each student is expected to complete the exercise and homework assignments individually and timely. Assignments should be submitted in advance of the due date and no extensions will be given.

The students are encouraged to complete the exercises with tools such as WEKA software, or write code with any computer languages that they are familiar with.

**Project: **We also provide 6 projects for the students to choose from. All the projects focus on the practice of data mining techniques related to transportation. It is estimated that the students will spend 10 classes to fulfill one project. Three students will form a group (4 members maximum) for a course project, and work together on this project based on a problem of interest to them. Each group is required to give a well prepared PowerPoint presentation to the class. Each group should submit a final report at the required date through email. The report should be written in WORD following a provided format.

Students can use techniques discussed in the class (covered by this course), and use any programming language that they are familiar with, such as C++, Matlab, R, Python, etc.

**G****rading ****P****olicy:**

Exercise and Assignments (In class and HW) ：30%

Oral Presentation：15%

Project Report ：15%

Final exam：40%