Resources related to the research conducted by the CN-Group at the TU Wien Institute of Telecommunications
Research focused on algorithms and theory for data analysis and machine learning. It fundamentally explores areas related to clustering, classification, outlier detection, validity methods, data analysis methodology, big data and stream data scenarios.
Research focused on the application of data analysis methods to network security and anomaly detection at network level. Some explored topics are: traffic characterization and classification; covert channels; network attacks, anomalies and misconfigurations; analysis methodologies and frameworks; feature selection; analysis of large networks and the Internet Background Radiation (aka darkspaces).
We study the discriminant power of network features for traffic analysis, classification and attack detection network level. We compare existing feature sets previouly proposed in the literature and study new proposals. We aim to obtain lightweight vectors able to deal with modern network traffic challenges, such as: encryption, big data, stream data, fast extraction and preprocessing, prompt responses, host/flow/network behaviour modeling, network monitoring, etc.
ODTF (One-class Decision Tree Fuzzyfier) is an algorithm that wraps a linear DT and establishes class-membership scores based on weighted distances to decision thresholds.
SDO (Sparse Data Observers) is an algorithm that establihes distance-based outlierness scores on data samples. SDO is devised to be embedded in systems or frameworks that operate autonomously and must process large amounts of data in a continuos manner. SDO is a machine learning solution for Big Data and stream data applications.
With the NTA Database we aim to collect relevant information about the research in network traffic analysis conducted during the last years. To this end, we have curated related papers from journals and conferences and stored the extracted data in JSON files. You can access database files, first meta-analysis results and data structure descriptions here.
CTC datasets consist of a mix of preprocessed network traffic data with and without covert timing channels. They are a demanding challenge for machine learning and data mining algorithms.
MDCGen is a tool for generating multidimensional synthetic datasets. It is devised for testing, evaluating and benchmarking clustering algorithms.
GOI provides a set of indices for absolute cluster validation and for the interpretation of the dataset context based on geometrical properties of the multidimensional data.
We analyzed captures from the Internet Background Radiation (aka darkspaces) by using the AGM format. Time series, plots, descriptions of classes and datasets are available for consultation.