空间分析操作包括三个方面:
可参考 GIS 理论与技术 – 作业 2:GIS 研究综述 #空间分析,这里不赘述。
指对 GIS 地理空间数据库中的属性数据进行常规统计分析。
相关概念
数据集中趋势分析
数据的离散程度分析
数据的分布
[kɝ'tosɪs]
):\(\beta=\frac{V_4}{\sigma^4}=\frac{\frac{\sum(X_\bar{X})^4f}{\sum f}}{\sigma^4}\)图形表达数据
探索性空间数据分析 (Exploratory Spatial Data Analysis, ESDA)
统计学是数据分析的主要工具,大量的统计分析方法以数据总体满足正态假设为依据,并在此基础上建立模型和推演。然而,实践中大量的数据不能满足正态假设,并且基于均值、方差等的模型在实际数据分析中缺乏稳健性,于是导致很多统计分析方法不能满足海量数据分析的要求。19 世纪 60 年代的 Tukey 面向数据分析的主题,提出了探索性数据分析(exploratory data analysis, EDA)的新思路。
探索性数据分析(EDA)的特点:对数据来源的总体不作假设,并且假设检验也经常被排除在外。这一技术使用统计图表、图形和统计概括方法对数据的特征进行分析和描述。 EDA 技术的核心:
“让数据说话”
在探索的基础上再对数据进行更为复杂的建模分析。
包括从简单的统计计算到高级的用于探索分析多变量数据集中模式的多元统计分析方法。
可视化的探索数据分析。
常用的图形方法有
直方图(histogram)
茎叶图(stem leaf)
4 | 4 6 7 9
5 |
6 | 3 4 6 8 8
7 | 2 2 5 6
8 | 1 4 8
9 |
10 | 6
key: 6|3=63
leaf unit: 1.0
stem unit: 10.0
箱线图(box plot)
The “interquartile range”, abbreviated “IQR”, is just the width of the box in the box-and-whisker plot. That is, IQR = Q3 – Q1. The IQR can be used as a measure of how spread-out the values are. Statistics assumes that your values are clustered around some central value. The IQR tells how spread out the “middle” values are; it can also be used to tell when some of the other values are “too far” from the central value. These “too far away” points are called “outliers”, because they “lie outside” the range in which we expect them.
(Why one and a half times the width of the box? Why does that particular value demark the difference between “acceptable” and “unacceptable” values? Because, when John Tukey was inventing the box-and-whisker plot in 1977 to display these values, he picked 1.5×IQR as the demarkation line for outliers. This has worked well, so we’ve continued using that value ever since.)
Adjusted box plots are intended for skew distributions. They rely on the medcouple(??) statistic of skewness. For a medcouple value of MC, the lengths of the upper and lower whiskers are respectively defined to be \[1.5 \times IQR \times e^{3 MC}, ~\qquad~ 1.5 \times IQR \times e^{-4 MC} \text{if} MC \geq 0\] and \[1.5 \times IQR \times e^{4 MC}, ~\quad~ 1.5 \times IQR \times e^{-3 MC} \text{if} MC \leq 0.\] Observe that for symmetrical distributions, the medcouple will be zero, and this reduces to Tukey’s boxplot with equal whisker lengths of \(1.5 \times IQR\) for both whiskers.
散点图(scatter plot)
Given a set of variables X1, X2, … , Xk, the scatter plot matrix contains all the pairwise scatter plots of the variables on a single page in a matrix format. That is, if there are k variables, the scatter plot matrix will have k rows and k columns and the ith row and jth column of this matrix is a plot of Xi versus Xj.
Matlab 的 plotmatrix
函数
plotmatrix(X,Y)
creates a matrix of subaxes containing scatter plots of the columns of X against the columns of Y. If X is p-by-n and Y is p-by-m, then plotmatrix produces an n-by-m matrix of subaxes.平行坐标图(parallel coordinate plot)
探索性空间数据分析(ESDA):探索性数据分析(EDA)在空间数据分析(SDA: Spatial Data Analysis)领域的推广。将数据的统计分析和地图定位紧密结合在一起。
ESDA = EDA + SDA
ESDA:
概括空间数据的性质;
探索空间数据中的模式;
产生和地理数据相关的假设;
在地图上识别异常数据的分布位置;
发现是否存在热点区域(hotspots)。
地图能够定位案例及其空间关系,并能在分析、检验和表示模型的结果中发挥重要作用。
read more
refs and see also