Several clustering algorithms have been introduced to literature in the last 10 years. Clustering methods usage depends on their complexity, the amount of data, the purpose of clustering and the predefined parameters. This case study, presents three of the most used clustering algorithms, K-means, DBSCAN and Ward’s method.
K-means belongs to partitioning spatial clustering algorithms. It is a frequently used clustering method and it is one of the simplest unsupervised learning algorithms. K-means defines clusters by partitioning all observations into groups, in which each observation belongs to the group with the nearest mean. The algorithm operates in iterations until the sum of squares from points to the assigned cluster centres is minimised. The end result of k-means algorithm is the partitioning of the data space into Voronoi cells. Continue reading “Comparison of Clustering Algorithms: K-Means, DBSCAN and Ward’s method”
Many definitions have been stated in literature for the term of spatial data. One of the simplest definition of spatial data, describes spatial data as “information related to the space occupied by objects” (Kolatch, 2001). Moreover, spatial data can be defined as any structured or unstructured data that refers to a specific location of a certain area. The area could be a two-dimensional or a multidimensional space, as for example the surface of the earth or an imaginary multidimensional space.
In data science and computer science, spatial data differ from ordinary data. Spatial data are stored in databases with spatial extension. In this way, they use specific data types (point, polygon, line, geometry collection etc.), formats and functionalities, according to the capabilities of each database management system. Thus, Spatial Data Mining (SDM) methods differ from those used in mining regular data. Continue reading “Overview of Spatial Data Mining Techniques and Spatial Clustering Algorithms”
As data mining tasks become more crucial day by day, data mining tools and data mining techniques are rapidly increasing. Currently, there have been developed a significant number of software that provide scientists and analysts with the appropriate tools to perform data mining tasks and apply mining algorithms.
Some of the most frequently used technologies for data mining are programming languages such as R, Python, Java, Scala and Julia. Also, there are plenty of desktop software that are used for data mining activities. The list describes five of the most widely used tools for data mining. Continue reading “List of the most widely used tools for Data Mining”
The need to analyse, process and extract knowledge from a large amount of data has been a critical subject for computer scientists and researchers since the early years of databases creation. However, in the last decade, the speed at which data is created and stored has increased exponentially and everything indicates that it will continue to grow. It is estimated that almost 2.5 quintillion bytes of data are created daily.
The main reasons that this deluge of data is growing so fast are: (a) the development of efficient software and management systems that made available the storage and processing of large amount of data, (b) the increase of global internet population, (c) the increase of cloud-based services and platforms, (d) the evolution of mobile technology and most of all (e) the excessive use of the internet in everyday life, including social media applications. Continue reading “An introduction to Knowledge Discovery in Databases and Data Mining”