본문 바로가기
네이버클라우드/AI

AI 7일차 (2023-05-16) 인공지능 기초 _머신러닝 - outliers (아웃라이어)

by prometedor 2023. 5. 16.

outliers (아웃라이어)

https://miro.medium.com/v2/resize:fit:1400/format:webp/1*0MPDTLn8KoLApoFvI0P2vQ.png

IQR : 사분위 값의 편차를 이용하여 이상치를 걸러내는 방법
ㄴ 전체 데이터를 정렬하여 이를 4등분하여 Q1(25%), Q2(50%), Q3(75%), Q4(100%) 중 IQR는 Q3 ~ Q1 사이가 됨

 

 

ml18_outliers.py

import numpy as np

oliers = np.array([-50, -10, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 50])

def outliers(data_out) :
    quartile_1, q2, quartile_3 = np.percentile(data_out,
                                              [25, 50, 75])
    print('1사분위 : ', quartile_1)
    print('2사분위 : ', q2)
    print('3사분위 : ', quartile_3)

    iqr = quartile_3 - quartile_1
    print('IQR : ', iqr)

    lower_bound = quartile_1 - (iqr * 1.5)
    upper_bound = quartile_3 + (iqr * 1.5)
    print('lower_bound : ', lower_bound)
    print('upper_bound : ', upper_bound)
    return np.where((data_out > upper_bound) | 
                    (data_out < lower_bound)) 

outliers_loc = outliers(oliers)
print('이상치의 위치 : ', outliers_loc)

# 시각화
import matplotlib.pyplot as plt
plt.boxplot(oliers)
plt.show()

 

 

ml18_outliers_EllipticEnvelope.py

import numpy as np

oliers = np.array([-50, -10, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 50])
oliers = oliers.reshape(-1, 1)
print(oliers.shape) # (14, 1)

from sklearn.covariance import EllipticEnvelope # 이상치 탐지
outliers = EllipticEnvelope(contamination=.1)   # contamination : 오염된 것  --> 10% 없애기
outliers.fit(oliers)
result = outliers.predict(oliers)

print(result)	# [-1  1  1  1  1  1  1  1  1  1  1  1  1 -1]
print(result.shape)	# (14,)