[가정] 모수적 검정 - 2) 분산의 동질성(homogeneity of variance)

Notice

Recent Posts

Recent Comments

Link

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Tags more

Archives

Today

Total

관리 메뉴

데이터분석 공부하기

[가정] 모수적 검정 - 2) 분산의 동질성(homogeneity of variance) 본문

통계

[가정] 모수적 검정 - 2) 분산의 동질성(homogeneity of variance)

Eileen's 2022. 1. 14. 23:11

모수적 검정(Parametric test) : 모수 추정을 위한 가정 검정
1. 분포의 정규성(normality) : data in each group should be normally distributed
2. 분산의 동질성(homogeneity) : data in each group should have approx. equal variance;
  표본들의 분산이 동일한 모집단에서 비롯되었음
3. 구간 척도(interval data) : 적어도 interval data로 측정되어야 한다.
4. 독립성 : 1) independent groups : data in each group should be randomly and independently sampled from the population, 2) repeated measure : behaviors of each participants should be independent(not influenced by one another), 3) regression : errors in the regression models are not correlated

분산의 동질성 검정

- 서로 다른 그룹(여러 수준의 결과변수)의 분산이 동질성을 가져야한다
ex) 소리 1에 대한 A그룹의 반응 분산, 소리 2에 대한 A그룹의 반응 분산의 정도 또는 B그룹의 반응 분산 정도
- 분산의 동질성(homogeneity of variance) vs 분산의 이질성(heterogeneity of variance)

레빈 검정(Levene's test) : 서로 다른 그룹의 분산이 같을 것이라는 귀무가설 검정; 유의하면(p < 0.5) 분산들이 서로 다르다
-사용 함수 : car 패키지 -> leveneTest(결과변수1, 결과변수2, 그룹화 변수, center = median/mean) 함수
- 그룹화 변수는 반드시 요인으로 지정되야함; center (default) = median
- 보고 : F(df1, df2) = 값, p = 값; For the percentage on the R exam, the variances were simliar for Duncetown and Sussex Uni. students, F(1, 98) = 2.09, p = .15, but for numeracy scores the variances were significantly different in the two groups, F(1, 98) = 5.37, p = .023.
- 한계: 표본이 크면 유의한 결과가 나오기 쉽다. (표본이 크면 검정력 상승)
하틀리의 F(max) : 분산비(the variance ratio) - 서로 다른 분산의 비
- 레빈 검정이 표본의 크기에 영향을 받으므로, 이중으로 검정할 수 있는 방법
- 분산비를 하틀리(Hartley)가 출판한 표이 임계치와 비교하여 분산들의 동질성 판정
참고: http://webspace.ship.edu/pgmarr/Geo441/Tables/Hartley's%20Fmax%20Table.pdf

Box plot

cleansing data

가정(assumption)을 위한 이상치 처리 :
1. remove the case(outlier etc.)
2. transform the data : transform all data, so that their relationships do not change
- what transformation method? trial and error. But you have to apply a single method to all data each time.
- method(p.245): 로그변환, 제곱근 변환, 역수 변환, 점수 뒤집기 변환
*) if it is a robust test, then even when it does not follow the assumptions, still use a statistical test
(가정을 위반하는 자료에서도 통계적 모향이 여전히 정확한 경우, 그대로 쓴다.)

*변환의 형태 using R : new.variable <- rowSums(기존변수); new.variable <- variable (>=, ==, !=) 2
     
#log * 데이터에 0이 있는 경우: 0의 자연로그 값이 존재하지 않으므로, 상수를 더해줘야함
festival.data$logday1 <- log(festival.data$day1 + 1)

#제곱근
festival.data$sqrtday1 <- sqrt(day1)

#역수 변환(1/변수) *0이 있는 경우, 나누기 오류가 있기에, 상수를 전체 데이터에 추가
festival.data$recday1 <- 1/(festival.data$day1 + 1)

#ifelse(조건, 조건이 TRUE일때 값, 조건이 FALSE일떄 값)
festival.data$day1NoOutlier <- ifelse(festival.data$day1 > 5, NA, festival.data$day1)
-> 위생점수는 4점 만점이므로, 4보다 클 경우 NA로 변경, 같을 경우 그대로 값을 둔다.

3. change the score : 1) next highest score + 1; 2) convert back from a z-score; 3) mean + (2*SE)

Other than transformation
-1) 변환으로 해결이 안되는 경우, 2) 변환이 가지는 여러 문제점(정상분포 데이터도 함께 변환하는 등)을 고려했을 때,
1. 비모수검정 사용 가능 : 제한된 경우 사용 가능
2. 강건한 검정(robust test) :
1) 양극단 절삭: 평균의 정확도는 분포의 대칭성의 의존, 양끝을 잘라내면 이상치/비대칭도 제거
-> 분포가 대칭이 아니더라도 정확한 결과를 낸다.
- 절사평균(trimmed mean) : 분포의 양극단에서 일정 비율의 점수를 절사한 분포에 기초한 평균; 절사의 양 임의로 결정
- M 추정량(M-estimator) : 절사의 양을 실험적으로 결정
2) 부트스트랩(boostrap) : 주어진 표본을 부트스트랩하여 표준분포를 추정할 수 있다.

결측치 처리

#결측치 파악
festival.data$daysMissing <- rowSums(cbind(is.na(festival.data$day1),
					is.na(festival.data$day2),
					is.na(festival.data$day3)))
                                                        
#결측치 제거 후 계산
festival.data$meanHygiene <- rowMeans(cbind(festival.data$day1,
					festival.data$day2,
					festival.data$day3, na.rm = TRUE)
* na.rm = FALSE(defult)의 경우, 결측값이 하나라도 있는 행은 계산하지 않는다.(결과: NA)

출처 및 참고 : '앤디 필드의 유쾌한 R 통계학'

'통계' 카테고리의 다른 글

상관(Correlation) (0)	2022.01.19
[가정] 모수적 검정- 1) 정규성(normality) (0)	2022.01.14
R - graph(ggplot2) (0)	2022.01.13
R- Rstudio in Mas OC (0)	2022.01.03
추론통계(Inferential statistics) : 2) 통계모형과 가설검정 (0)	2021.12.30

'통계' Related Articles

데이터분석 공부하기

[가정] 모수적 검정 - 2) 분산의 동질성(homogeneity of variance) 본문

[가정] 모수적 검정 - 2) 분산의 동질성(homogeneity of variance)

'통계' 카테고리의 다른 글

티스토리툴바