Notice
Recent Posts
Recent Comments
Link
«   2024/05   »
1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31
Archives
Today
Total
관리 메뉴

데이터분석 공부하기

[PANDAS] (if-else) 조건에 따른 새 컬럼 생성하기 본문

Python

[PANDAS] (if-else) 조건에 따른 새 컬럼 생성하기

Eileen's 2021. 10. 18. 17:12

[Multiple Conditions]

 

* Method 1 : np.select()

더보기
numpy.select(condlist, choicelist, default=0)

Return an array drawn from elements in choicelist, depending on conditions.

-> list를 사용하여 여러 조건을 깔끔하게 걸 수 있다; 스피드가 빠르다.

conditions = [(hr['HourlyRate'] < 30),
             (hr['HourlyRate'] >= 30) & (hr['HourlyRate'] < 50),
             (hr['HourlyRate'] >= 50) & (hr['HourlyRate'] < 70),
             (hr['HourlyRate'] >= 70) & (hr['HourlyRate'] < 90),
             (hr['HourlyRate'] >= 90)]

choices = [1, 2, 3, 4, 5]

hr['HourlyRate_cat'] = np.select(conditions, choices)

 

 

* Method 2 : df.apply()

def income_cat(df):
  if df['MonthlyIncome'] < 3000:
    return 1
  elif df['MonthlyIncome'] < 5000:
    return 2
  elif df['MonthlyIncome'] < 7000:
    return 3
  elif df['MonthlyIncome'] < 9000:
    return 4 
  else :
    return 5

hr['MonthlyIncome_cat'] = hr.apply(income_cat, axis = 1)

 

* Method 3 : df.loc()

hr.loc[hr['MonthlyRate'] < 8000 , 'MonthlyRate_cat'] = 1
hr.loc[hr['MonthlyRate'] < 13000 , 'MonthlyRate_cat'] = 2
hr.loc[hr['MonthlyRate'] < 180000 , 'MonthlyRate_cat'] = 3
hr.loc[hr['MonthlyRate'] < 23000 , 'MonthlyRate_cat'] = 4
hr.loc[hr['MonthlyRate'] >= 23000 , 'MonthlyRate_cat'] = 5

 

 

[Few Conditions]

 

*Method 1: np.where

더보기

->numpy.where(condition[, x, y])

Return elements chosen from x or y depending on condition.

hr['Age_cat'] = np.where(hr['Age'].values > 20, 'Adult', 'Child')

 

*Method 2: List Comprehension

hr['Age_cat'] = ['Adult' if x else 'Child' for x in list(hr['Age'] > 20)]

 

*Method 3: apply, Lambda

-> ...else ('baby' if x < 10 else 'child') )등 연달아 조건을 달 수 있지만 복잡해 지므로, 여러 조건이 있을 시, 위에 제시한 방법을 추천.

hr['Age_cat'] = hr['Age'].apply(lambda x : 'Adult' if x > 20 else 'Child')