统计学
集中趋势度量
本节实例计算 Rust 数组中包含的数据集的集中趋势度量。对于一个空的数据集,可能没有平均数、中位数或众数去计算,因此每个函数都返回 [Option
] ,由调用者处理。
第一个实例是通过对数据引用生成一个迭代器,然后计算平均数(所有测量值的总和除以测量值的计数),并使用 [sum
] 和 [len
] 函数分别确定值的总和及值的计数。
fn main() {
let data = [3, 1, 6, 1, 5, 8, 1, 8, 10, 11];
let sum = data.iter().sum::<i32>() as f32;
let count = data.len();
let mean = match count {
positive if positive > 0 => Some(sum / count as f32),
_ => None
};
println!("Mean of the data is {:?}", mean);
}
第二个实例使用快速选择算法(quick select algorithm)计算中位数,该算法只对已知可能包含中位数的数据集的分区进行排序,从而避免了完整[排序][sort
]。该算法使用 [cmp
] 和 [Ordering
] 简便地地决定要检查的下一个分区,并使用 [split_at
] 为每个步骤的下一个分区选择一个任意的枢轴量。
use std::cmp::Ordering;
fn partition(data: &[i32]) -> Option<(Vec<i32>, i32, Vec<i32>)> {
match data.len() {
0 => None,
_ => {
let (pivot_slice, tail) = data.split_at(1);
let pivot = pivot_slice[0];
let (left, right) = tail.iter()
.fold((vec![], vec![]), |mut splits, next| {
{
let (ref mut left, ref mut right) = &mut splits;
if next < &pivot {
left.push(*next);
} else {
right.push(*next);
}
}
splits
});
Some((left, pivot, right))
}
}
}
fn select(data: &[i32], k: usize) -> Option<i32> {
let part = partition(data);
match part {
None => None,
Some((left, pivot, right)) => {
let pivot_idx = left.len();
match pivot_idx.cmp(&k) {
Ordering::Equal => Some(pivot),
Ordering::Greater => select(&left, k),
Ordering::Less => select(&right, k - (pivot_idx + 1)),
}
},
}
}
fn median(data: &[i32]) -> Option<f32> {
let size = data.len();
match size {
even if even % 2 == 0 => {
let fst_med = select(data, (even / 2) - 1);
let snd_med = select(data, even / 2);
match (fst_med, snd_med) {
(Some(fst), Some(snd)) => Some((fst + snd) as f32 / 2.0),
_ => None
}
},
odd => select(data, odd / 2).map(|x| x as f32)
}
}
fn main() {
let data = [3, 1, 6, 1, 5, 8, 1, 8, 10, 11];
let part = partition(&data);
println!("Partition is {:?}", part);
let sel = select(&data, 5);
println!("Selection at ordered index {} is {:?}", 5, sel);
let med = median(&data);
println!("Median is {:?}", med);
}
最后一个实例使用可变的 [HashMap
] 来计算众数,[fold
] 和 [entry
] API 用来从集合中收集每个不同整数的计数。[HashMap
] 中最常见的值可以用 [max_by_key
] 取得。
use std::collections::HashMap;
fn main() {
let data = [3, 1, 6, 1, 5, 8, 1, 8, 10, 11];
let frequencies = data.iter().fold(HashMap::new(), |mut freqs, value| {
*freqs.entry(value).or_insert(0) += 1;
freqs
});
let mode = frequencies
.into_iter()
.max_by_key(|&(_, count)| count)
.map(|(value, _)| *value);
println!("Mode of the data is {:?}", mode);
}
计算标准偏差
本实例计算一组测量值的标准偏差和 z 分数(z-score)。
标准偏差定义为方差的平方根(用 f32 浮点型的 [sqrt
] 计算),其中方差是每个测量值与平均数
之间的平方差的和
除以测量次数。
z 分数(z-score)是指单个测量值偏离数据集平均数
的标准差数。
fn mean(data: &[i32]) -> Option<f32> {
let sum = data.iter().sum::<i32>() as f32;
let count = data.len();
match count {
positive if positive > 0 => Some(sum / count as f32),
_ => None,
}
}
fn std_deviation(data: &[i32]) -> Option<f32> {
match (mean(data), data.len()) {
(Some(data_mean), count) if count > 0 => {
let variance = data.iter().map(|value| {
let diff = data_mean - (*value as f32);
diff * diff
}).sum::<f32>() / count as f32;
Some(variance.sqrt())
},
_ => None
}
}
fn main() {
let data = [3, 1, 6, 1, 5, 8, 1, 8, 10, 11];
let data_mean = mean(&data);
println!("Mean is {:?}", data_mean);
let data_std_deviation = std_deviation(&data);
println!("Standard deviation is {:?}", data_std_deviation);
let zscore = match (data_mean, data_std_deviation) {
(Some(mean), Some(std_deviation)) => {
let diff = data[4] as f32 - mean;
Some(diff / std_deviation)
},
_ => None
};
println!("Z-score of data at index 4 (with value {}) is {:?}", data[4], zscore);
}