Data Mining with Big Data
Professor Xindong Wu
Department of Computer Science
University of Vermont, USA
讲座时间:2013年12月29日(周日) 09:30~10:10
讲座地点:厦门大学科学艺术中心多功能厅
Abstract: Big Data concerns large-volume, complex, growing data sets with multiple, autonomous sources. With the fast development of networking, data storage, and the data collection capacity, Big Data is now rapidly expanding in all science and engineering domains, including physical, biological and bio-medical sciences. This talk presents a HACE theorem that characterizes the features of the Big Data revolution, and discusses a Big Data processing model, from the data mining perspective. This data-driven model involves demand-driven aggregation of information sources, mining and analysis, user interest modeling, and security and privacy considerations. Challenging issues in the data-driven model and also in the Big Data revolution are analyzed.
Professor Xindong Wu is a Professor of Computer Science at the University of Vermont (USA), a Yangtze River Scholar in the School of Computer Science and Information Engineering at the Hefei University of Technology (China), and a Fellow of the IEEE and the AAAS. He holds a PhD in Artificial Intelligence from the University of Edinburgh, Britain. His research interests include data mining, Big Data analytics, knowledge-based systems, and Web information exploration. Dr. Wu is the Steering Committee Chair of the IEEE International Conference on Data Mining (ICDM), the Editor-in-Chief of Knowledge and Information Systems (KAIS), the Founding Chair (2002-2006) of the IEEE Computer Society Technical Committee on Intelligent Informatics (TCII), and a Series Editor of the Springer Book Series on Advanced Information and Knowledge Processing (AI&KP). He was the Editor-in-Chief of the IEEE Transactions on Knowledge and Data Engineering (TKDE) between January 2005 and December 2008, and has served as Program Committee Chair/Co-Chair for ICDM '03, KDD-07, CIKM 2010, and ASONAM 2014. Professor Wu is the 2004 ACM SIGKDD Service Award winner, the 2006 IEEE ICDM Outstanding Service Award winner, and a 2012 IEEE Computer Society Technical Achievement Award recipient "for pioneering contributions to data mining and applications".
How to classify categorical-and-numerical-attribute data in big data analytics?
Professor Yiu-ming Cheung
Department of Computer Science
Hong Kong Baptist University, China
讲座时间:2013年12月29日(周日)10:10~10:50
讲座地点:厦门大学科学艺术中心多功能厅
Abstract:Every day, over 2.5 quintillion bytes of data are created from everywhere of our social life. Such huge amount of data makes difficult to handle them with traditional data management and processing tools. Therefore, to extract the most valuable pieces of information from such big data, novel and efficient analyzing technologies should be explored. As we know, clustering is a very useful technology for data analysis. However, it has suffered big challenges in big data environment. One of the problems is due to the variety of big data. That is, the data can be any type and the attributes of data can be numerical, categorical, or both. Unfortunately, most of the existing clustering approaches are applicable to purely numerical or categorical data only, but not the both. In general, it is a nontrivial task to perform clustering on mixed data composed of numerical and categorical attributes because there exists an awkward gap between the similarity metrics for categorical and numerical data. In our work, we therefore present a general clustering framework based on the concept of object-cluster similarity and give a unified similarity metric which can be simply applied to the data with categorical, numerical, and mixed attributes. Accordingly, an iterative clustering algorithm is developed, whose outstanding performance is experimentally demonstrated on different benchmark data sets. Moreover, to circumvent the difficult selection problem of cluster number, we further develop a penalized competitive learning algorithm within the proposed clustering framework. The embedded competition and penalization mechanisms enable this improved algorithm to determine the number of clusters automatically by gradually eliminating the redundant clusters. The experimental results show the efficacy of the proposed approach.
ProfessorYiu-ming Cheung received his Ph.D. degree from Department of Computer Science and Engineering at The Chinese University of Hong Kong in 2000. Currently, he is a Professor at the Department of Computer Science in Hong Kong Baptist University (HKBU). Also, he is the senior member of IEEE and ACM.
His research interests include machine learning, pattern recognition, watermarking, image and video processing. He has published over 150 papers, some of which are included in the leading international journals, e.g., IEEE Transactions on Pattern Analysis and Machine Intelligence, IEEE Transactions on Image Processing, IEEE Transactions on Knowledge and Data Engineering, IEEE Transactions on Neural Networks, IEEE Transactions on SMC (Part B), and Pattern Recognition. He was the founding chair of IEEE (Hong Kong) Chapter of Computational Intelligence Society. Also, he was the programming committee chair of WI’2012 and IAT’2012, organizing committee chair of WI’06, IAT’06, ICDM’06, and IDEAL'2003, web and publicity chair of WI'2003 and IAT'2003, and financial chair of IEEE CIFEr'2003. Furthermore, he has served as a program committee member in several major international conferences including ICPR’2010 and KDD’2007. Currently, he is the associate editor / guest editor of several international journals, e.g. Integrated Computer Aided Engineering (SCI IF:3.451), Knowledge and Information Systems: An International Journal(SCI IF:2.225) and International Journal of Pattern Recognition and Artificial Intelligence (SCI IF:0.624).
Large-scale Machine Learning
Professor Chih-Jen Lin
Departmentof Computer Science
National Taiwan University,China
讲座时间:2013年12月29日(周日)11:05~11:45
讲座地点:厦门大学科学艺术中心多功能厅
Abstract:Most traditional learning algorithms were designed to run on acomputer with a single processor, but data larger than a machine'scapacity have become common in many application areas. We addresslarge-scale machine learning in three aspects. First, we discuss whento and when not to apply distributed learning and miningmethods. Traditional machine learning algorithms focus on thecomputation, but we argue that in a distributed environment many otherissues such as data locality or communication cost must be taken intoconsideration. Second, we discuss distributed classificationalgorithms including linear and kernel Support Vector Machines (SVM),trees, and others. Third, we present approaches for distributed dataclustering. Methods such as k-means, spectral clustering, and LatentDirichlet Allocation (LDA) are briefly covered. Finally, we discussfuture challenges to tackle large-scale data classification andclustering.
Professor Chih-Jen Lin is currently a distinguished professor at the Departmentof Computer Science, National Taiwan University. He obtained hisB.S. degree from National Taiwan University in 1993 and Ph.D. degreefrom University of Michigan in 1998. His major research areas includemachine learning, data mining, and numerical optimization. He is bestknown for his work on support vector machines (SVM) for dataclassification. His software LIBSVM is one of the most widely used andcited SVM packages. For his research work he has received many awards,including the ACM KDD 2010 best paper award. He is an IEEE fellow andan ACM distinguished scientist for his contribution to machinelearning algorithms and software design.
Large-scale Conformal Prediction
Professor Vladimir Vovk
Computer Learning Research Centre
Department of Computer Science
Royal Holloway, University of London
讲座时间:2013年12月29日(周日)14:00~14:40
讲座地点:厦门大学科学艺术中心1号会议厅
Abstract:Conformal prediction is an approach to machine learning in which learning algorithms provide provably valid information about accuracy and reliability of their predictions. Conformal predictors can be built on top of standard learning algorithms, such as support vector machines, boosting, neural networks, Bayesian algorithms, etc. A disadvantage of conformal predictors is their relative computational inefficiencyfor many underlying learning algorithms. Inductive conformal predictors have been designed to overcome this disadvantage. Whereas computationally efficient, inductive conformal predictors sacrifice different parts of the training set at different stages of prediction, which affects their informational efficiency. This talk will briefly review the methods of conformal and inductive conformal prediction, but it will emphasize the method of cross-conformal prediction, which is a hybrid of the methods of inductive conformal prediction and cross-validation. The computational efficiency of cross-conformal predictors is comparable to that of inductive conformal predictors, and their informational efficiency is close to that of conformal predictors. Moreover, cross-conformal predictors are perfectly suited to parallel computations.
Professor Vladimir Vovk isProfessor of Computer Science at Royal Holloway, University of London.His research interests include machine learning and the foundations of probability.He was one of the founders of prediction with expert advice,an area of machine learning avoiding making any statistical assumptions about the data.In 2001 he and Glenn Shafer published a book("Probability and Finance: It's Only a Game", New York: Wiley)on new game-theoretic foundations of probability.His second book("Algorithmic Learning in a Random World", New York: Springer, 2005),co-authored with Alex Gammerman and Glenn Shafer,discusses conformal prediction.At this time his research centres on the theory of conformal predictionand establishing new kinds of performance guarantees in prediction with expert advice.
Learning to Understand Documents
Dr.Tao Li
School of Computer Science
Florida International University, USA
讲座时间:2013年12月29日(周日) 14:40~15:20
讲座地点:厦门大学科学艺术中心1号会议厅
Abstract:In the Internet age, the volume and complexity of textual data (e.g., news, blogs, webpages) are explosively growing. Document understanding techniques such as document clustering and summarizationcontribute to discover useful and meaningful information from documents. For example, documentclustering provides an efficient way in organizing web search results, and document summarization cangenerate informative snippets to help users in web exploring. In this talk, I will present some of our research projects on developing novel data mining and machine learning techniques toimprove document understanding including algorithms for integrating document clustering and summarization to obtain meaningful document clusters with summarized interpretation and summarizing the difference and evolution of different document sources.
Dr. Tao Li is currently an associate professor in the School of Computing and Information Sciences at Florida International University. He received his Ph.D. in computer science from the Department of Computer Science, University of Rochester in July 2004. His research interests are in data mining, information retrieval and computing system management. He is a recipient of NSF CAREER Award and multiple IBM Faculty Research Awards. He is on the editorial board of ACM Transactions on Knowledge Discovery from Data (ACM TKDD), IEEE Transactions on Knowledge and Data Engineering (IEEE TKDE), and Knowledge and Information System Journal (KAIS).
Scalable Privacy Preservation for Big Data Applications on Cloud
Dr. Jinjun Chen
Faculty of Engineering and Information Technology
University of Technology, Sydney, Australia
讲座时间:2013年12月29日(周日)15:20~16:00
讲座地点:厦门大学科学艺术中心1号会议厅
Abstract:Cloud computing promises an open environment where customers can deploy IT services in pay-as-you-go fashion while saving huge capital investment in their own IT infrastructure. While cloud provides a promising infrastructure for big data applications such as medical data analysis, privacy preservation becomes critical because user privacy is a serious concern in practice. In this talk, we will discuss privacy preservation in general and then propose our solutions for scalable privacy preservationof big data applications on cloud.
Dr Jinjun Chen is an Associate Professor from Faculty of Engineering and IT, University of Technology Sydney (UTS), Australia. He is the Director of Lab of Cloud Computing and Distributed Systems at UTS. He holds a PhD in Computer Science and Software Engineering from Swinburne University of Technology, Australia. Dr Chen’s research interests include cloud computing, big data, workflow management, privacy and security, and related various research topics. His research results have been published in more than 100 papers in high quality journals and at conferences, including IEEE Transactions on Service Computing, ACM Transactions on Autonomous and Adaptive Systems, ACM Transactions on Software Engineering and Methodology (TOSEM), IEEE Transactions on Software Engineering (TSE), and IEEE Transactions on Parallel and Distributed Systems (TPDS).
He received Swinburne Vice-Chancellor’s Research Award for early career researchers (2008), IEEE Computer Society Outstanding Leadership Award (2008-2009) and (2010-2011), IEEE Computer Society Service Award (2007), Swinburne Faculty of ICT Research Thesis Excellence Award (2007). He is an Associate Editor for IEEE Transactions on Parallel and Distributed Systems, IEEE Transactions on Computers, and Journal of Computer and System Sciences. He is the Vice Chair of IEEE Computer Society’s Technical Committee on Scalable Computing (TCSC), Vice Chair of Steering Committee of Australasian Symposium on Parallel and Distributed Computing, Founder and Coordinator of IEEE TCSC Technical Area on Workflow Management in Scalable Computing Environments, Founder and steering committee co-chair of International Conference on Cloud and Green Computing, and International Conference on Big Data Science and Engineering.
外事秘书 在
提交