728x90
INSERT¶
- CONCAT(db:union) : 붙이기 (위 + 아래 axis=0를 주로 씀) 조인이 되긴하지만 쓰지마라
- MERGE(db:join)
In [145]:
import pandas as pd
import numpy as np
df=pd.read_csv("./emp2.csv") #csv파일 불러서 df변수에 저장
df= df.fillna(0)
df['COMM']=df['COMM'].astype(int)
df['MGR']=df['MGR'].astype(int)
df
Out[145]:
EMPNO | ENAME | JOB | MGR | HIRDDATE | SAL | COMM | DEPTNO | |
---|---|---|---|---|---|---|---|---|
0 | 7369 | SMITH | CLERK | 7902 | 1980-12-17 | 1800 | 0 | 30 |
1 | 7499 | ALLEN | SALESMAN | 7698 | 1981-02-20 | 1600 | 300 | 30 |
2 | 7521 | WARD | SALESMAN | 7698 | 1981-02-22 | 1250 | 500 | 30 |
3 | 7566 | JONES | MANAGER | 7839 | 1981-04-02 | 2975 | 0 | 20 |
4 | 7654 | MARTIN | SALESMAN | 7698 | 1981-09-28 | 1250 | 1400 | 30 |
5 | 7698 | BLAKE | MANAGER | 7839 | 1981-05-01 | 2850 | 0 | 30 |
6 | 7782 | CLARK | MANAGER | 7839 | 1981-06-09 | 2450 | 0 | 10 |
7 | 7788 | SCOTT | ANALYST | 7566 | 1987-07-13 | 3000 | 0 | 20 |
8 | 7839 | KING | PRESIDENT | 0 | 1981-11-17 | 5000 | 0 | 10 |
9 | 7844 | TURNER | SALESMAN | 7698 | 1981-09-08 | 1500 | 0 | 30 |
10 | 7876 | ADAMS | CLERK | 7788 | 1987-07-13 | 1100 | 0 | 20 |
11 | 7900 | JAMES | CLERK | 7698 | 1981-12-03 | 950 | 0 | 30 |
12 | 7902 | FORD | ANALYST | 7566 | 1981-12-03 | 6000 | 0 | 20 |
13 | 7934 | MILLER | CLERK | 7782 | 1982-01-23 | 1300 | 0 | 10 |
In [147]:
list=[col.lower() for col in df.columns.tolist()]
print(list)
df.columns=list
df
['empno', 'ename', 'job', 'mgr', 'hirddate', 'sal', 'comm', 'deptno']
Out[147]:
empno | ename | job | mgr | hirddate | sal | comm | deptno | |
---|---|---|---|---|---|---|---|---|
0 | 7369 | SMITH | CLERK | 7902 | 1980-12-17 | 1800 | 0 | 30 |
1 | 7499 | ALLEN | SALESMAN | 7698 | 1981-02-20 | 1600 | 300 | 30 |
2 | 7521 | WARD | SALESMAN | 7698 | 1981-02-22 | 1250 | 500 | 30 |
3 | 7566 | JONES | MANAGER | 7839 | 1981-04-02 | 2975 | 0 | 20 |
4 | 7654 | MARTIN | SALESMAN | 7698 | 1981-09-28 | 1250 | 1400 | 30 |
5 | 7698 | BLAKE | MANAGER | 7839 | 1981-05-01 | 2850 | 0 | 30 |
6 | 7782 | CLARK | MANAGER | 7839 | 1981-06-09 | 2450 | 0 | 10 |
7 | 7788 | SCOTT | ANALYST | 7566 | 1987-07-13 | 3000 | 0 | 20 |
8 | 7839 | KING | PRESIDENT | 0 | 1981-11-17 | 5000 | 0 | 10 |
9 | 7844 | TURNER | SALESMAN | 7698 | 1981-09-08 | 1500 | 0 | 30 |
10 | 7876 | ADAMS | CLERK | 7788 | 1987-07-13 | 1100 | 0 | 20 |
11 | 7900 | JAMES | CLERK | 7698 | 1981-12-03 | 950 | 0 | 30 |
12 | 7902 | FORD | ANALYST | 7566 | 1981-12-03 | 6000 | 0 | 20 |
13 | 7934 | MILLER | CLERK | 7782 | 1982-01-23 | 1300 | 0 | 10 |
In [66]:
df10=df[df['deptno']==10].copy()
df10
Out[66]:
empno | ename | job | mgr | hirddate | sal | comm | deptno | |
---|---|---|---|---|---|---|---|---|
6 | 7782 | CLARK | MANAGER | 7839.0 | 1981-06-09 | 2450 | NaN | 10 |
8 | 7839 | KING | PRESIDENT | NaN | 1981-11-17 | 5000 | NaN | 10 |
13 | 7934 | MILLER | CLERK | 7782.0 | 1982-01-23 | 1300 | NaN | 10 |
In [15]:
df20=df[df['deptno']==20].copy()
df20
Out[15]:
empno | ename | job | mgr | hirddate | sal | comm | deptno | |
---|---|---|---|---|---|---|---|---|
3 | 7566 | JONES | MANAGER | 7839.0 | 1981-04-02 | 2975 | NaN | 20 |
7 | 7788 | SCOTT | ANALYST | 7566.0 | 1987-07-13 | 3000 | NaN | 20 |
10 | 7876 | ADAMS | CLERK | 7788.0 | 1987-07-13 | 1100 | NaN | 20 |
12 | 7902 | FORD | ANALYST | 7566.0 | 1981-12-03 | 6000 | NaN | 20 |
- pd.concat([df명,df명], axis=0 or 1, ignore_index=True)
- axis=0 -> 아래로
- axis=1 -> 옆으로
- ignore_index=True -> 합치고 인덱스 재정렬
- concat axis=1로 했다 -> outerjoin
CONCAT() - UNION¶
In [27]:
df1020_1 = pd.concat([df10,df20], axis=0) #위 아래로 합치기 (로우를 늘린다)
df1020_1
Out[27]:
empno | ename | job | mgr | hirddate | sal | comm | deptno | |
---|---|---|---|---|---|---|---|---|
6 | 7782 | CLARK | MANAGER | 7839.0 | 1981-06-09 | 2450 | NaN | 10 |
8 | 7839 | KING | PRESIDENT | NaN | 1981-11-17 | 5000 | NaN | 10 |
13 | 7934 | MILLER | CLERK | 7782.0 | 1982-01-23 | 1300 | NaN | 10 |
3 | 7566 | JONES | MANAGER | 7839.0 | 1981-04-02 | 2975 | NaN | 20 |
7 | 7788 | SCOTT | ANALYST | 7566.0 | 1987-07-13 | 3000 | NaN | 20 |
10 | 7876 | ADAMS | CLERK | 7788.0 | 1987-07-13 | 1100 | NaN | 20 |
12 | 7902 | FORD | ANALYST | 7566.0 | 1981-12-03 | 6000 | NaN | 20 |
In [26]:
df1020_1 = pd.concat([df10,df20], axis=0, ignore_index=True) #ignore_index=True 인덱스 재정렬
df1020_1
Out[26]:
empno | ename | job | mgr | hirddate | sal | comm | deptno | |
---|---|---|---|---|---|---|---|---|
0 | 7782 | CLARK | MANAGER | 7839.0 | 1981-06-09 | 2450 | NaN | 10 |
1 | 7839 | KING | PRESIDENT | NaN | 1981-11-17 | 5000 | NaN | 10 |
2 | 7934 | MILLER | CLERK | 7782.0 | 1982-01-23 | 1300 | NaN | 10 |
3 | 7566 | JONES | MANAGER | 7839.0 | 1981-04-02 | 2975 | NaN | 20 |
4 | 7788 | SCOTT | ANALYST | 7566.0 | 1987-07-13 | 3000 | NaN | 20 |
5 | 7876 | ADAMS | CLERK | 7788.0 | 1987-07-13 | 1100 | NaN | 20 |
6 | 7902 | FORD | ANALYST | 7566.0 | 1981-12-03 | 6000 | NaN | 20 |
좌우합치기 (CONCAT)¶
In [25]:
df1020_2 = pd.concat([df10,df20], axis=1) #좌, 우 옆으로 합치기 (컬럼을 늘린다)
df1020_2
# 없으면 없는 대로 NaN처리 -> outerjoin (일반적이지 않음)
Out[25]:
empno | ename | job | mgr | hirddate | sal | comm | deptno | empno | ename | job | mgr | hirddate | sal | comm | deptno | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 7566.0 | JONES | MANAGER | 7839.0 | 1981-04-02 | 2975.0 | NaN | 20.0 |
6 | 7782.0 | CLARK | MANAGER | 7839.0 | 1981-06-09 | 2450.0 | NaN | 10.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
7 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 7788.0 | SCOTT | ANALYST | 7566.0 | 1987-07-13 | 3000.0 | NaN | 20.0 |
8 | 7839.0 | KING | PRESIDENT | NaN | 1981-11-17 | 5000.0 | NaN | 10.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
10 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 7876.0 | ADAMS | CLERK | 7788.0 | 1987-07-13 | 1100.0 | NaN | 20.0 |
12 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 7902.0 | FORD | ANALYST | 7566.0 | 1981-12-03 | 6000.0 | NaN | 20.0 |
13 | 7934.0 | MILLER | CLERK | 7782.0 | 1982-01-23 | 1300.0 | NaN | 10.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
In [24]:
df1020_2 = pd.concat([df10,df20], axis=1, ignore_index=True) #ignore_index=True 인덱스 재정렬
df1020_2
Out[24]:
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 7566.0 | JONES | MANAGER | 7839.0 | 1981-04-02 | 2975.0 | NaN | 20.0 |
6 | 7782.0 | CLARK | MANAGER | 7839.0 | 1981-06-09 | 2450.0 | NaN | 10.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
7 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 7788.0 | SCOTT | ANALYST | 7566.0 | 1987-07-13 | 3000.0 | NaN | 20.0 |
8 | 7839.0 | KING | PRESIDENT | NaN | 1981-11-17 | 5000.0 | NaN | 10.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
10 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 7876.0 | ADAMS | CLERK | 7788.0 | 1987-07-13 | 1100.0 | NaN | 20.0 |
12 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 7902.0 | FORD | ANALYST | 7566.0 | 1981-12-03 | 6000.0 | NaN | 20.0 |
13 | 7934.0 | MILLER | CLERK | 7782.0 | 1982-01-23 | 1300.0 | NaN | 10.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
join=inner (CONCAT)¶
In [30]:
df1020_2 = pd.concat([df,df10], axis=1, join='inner') #겹치는 컬럼이 있으면 join
df1020_2 #deptno <- 겹치는 부분 으로 join
Out[30]:
empno | ename | job | mgr | hirddate | sal | comm | deptno | empno | ename | job | mgr | hirddate | sal | comm | deptno | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
6 | 7782 | CLARK | MANAGER | 7839.0 | 1981-06-09 | 2450 | NaN | 10 | 7782 | CLARK | MANAGER | 7839.0 | 1981-06-09 | 2450 | NaN | 10 |
8 | 7839 | KING | PRESIDENT | NaN | 1981-11-17 | 5000 | NaN | 10 | 7839 | KING | PRESIDENT | NaN | 1981-11-17 | 5000 | NaN | 10 |
13 | 7934 | MILLER | CLERK | 7782.0 | 1982-01-23 | 1300 | NaN | 10 | 7934 | MILLER | CLERK | 7782.0 | 1982-01-23 | 1300 | NaN | 10 |
Series 합치기 (CONCAT)¶
- pd.Series()
- pd.DataFrame()
pd.Series()¶
In [32]:
s1 = pd.Series(data=[1,2,3,4,5]) #파라미터 속성 : data index name
s1
Out[32]:
0 1
1 2
2 3
3 4
4 5
dtype: int64
In [34]:
#인덱스 변경 (defalt = 0,1,2,..)
s1 = pd.Series(data=[1,2,3,4,5], index=[10,20,30,40,50]) #파라미터 속성 : data index name
s1
Out[34]:
10 1
20 2
30 3
40 4
50 5
dtype: int64
In [35]:
# 네임 설정(Series의 이름 설정) -> CONCAT시 칼럼명으로 활용
s1 = pd.Series(data=[1,2,3,4,5], index=[10,20,30,40,50], name='s1name') #파라미터 속성 : data index name
s1
Out[35]:
10 1
20 2
30 3
40 4
50 5
Name: s1name, dtype: int64
In [39]:
s1 = pd.Series([1,2,3,4,5], [10,20,30,40,50]) #파라미터 명시하지말고 그냥 이렇게 써라
s1
Out[39]:
10 1
20 2
30 3
40 4
50 5
dtype: int64
In [41]:
s1= pd.Series([1,2,3,4,5])
s2= pd.Series([10,20,30,40,50])
s12= pd.concat([s1,s2]) #defalt: axis=0
s12
Out[41]:
0 1
1 2
2 3
3 4
4 5
0 10
1 20
2 30
3 40
4 50
dtype: int64
In [44]:
s12= pd.concat([s1,s2], axis=1)
print(type(s12)) #Series 합치면 df가 된다
s12.columns = ['col1','col2']
s12
<class 'pandas.core.frame.DataFrame'>
Out[44]:
col1 | col2 | |
---|---|---|
0 | 1 | 10 |
1 | 2 | 20 |
2 | 3 | 30 |
3 | 4 | 40 |
4 | 5 | 50 |
In [45]:
s1= pd.Series(data=[1,2,3,4,5], name='col1') #name이 컬럼명으로 들어간다.
s2= pd.Series(data=[10,20,30,40,50], name='col2')
s12= pd.concat([s1,s2], axis=1)
s12
Out[45]:
col1 | col2 | |
---|---|---|
0 | 1 | 10 |
1 | 2 | 20 |
2 | 3 | 30 |
3 | 4 | 40 |
4 | 5 | 50 |
DataFrame에 List 합치기¶
list + DF¶
In [122]:
list_data = [8000,'CHOI','CLERK','7782.0','1995-05-31',2000,np.NaN,10]
list_data
Out[122]:
[8000, 'CHOI', 'CLERK', '7782.0', '1995-05-31', 2000, nan, 10]
In [123]:
df_c=pd.DataFrame(data=[list_data], columns=df10.columns) #columns = df10.columns -> df10의 컬럼을 가져와라
df_c
Out[123]:
empno | ename | job | mgr | hirddate | sal | comm | deptno | |
---|---|---|---|---|---|---|---|---|
0 | 8000 | CHOI | CLERK | 7782.0 | 1995-05-31 | 2000 | NaN | 10 |
In [124]:
df2= pd.concat([df10,df_c], axis=0)
df2
Out[124]:
empno | ename | job | mgr | hirddate | sal | comm | deptno | |
---|---|---|---|---|---|---|---|---|
6 | 7782 | CLARK | MANAGER | 7839.0 | 1981-06-09 | 2450 | NaN | 10 |
8 | 7839 | KING | PRESIDENT | NaN | 1981-11-17 | 5000 | NaN | 10 |
13 | 7934 | MILLER | CLERK | 7782.0 | 1982-01-23 | 1300 | NaN | 10 |
0 | 8000 | CHOI | CLERK | 7782.0 | 1995-05-31 | 2000 | NaN | 10 |
In [125]:
list_data2 = [8000,'CHOI','CLERK','7782.0','1995-05-31',2000,np.NaN,10]
df_c2=pd.DataFrame(data=list_data2)
df_c2
Out[125]:
0 | |
---|---|
0 | 8000 |
1 | CHOI |
2 | CLERK |
3 | 7782.0 |
4 | 1995-05-31 |
5 | 2000 |
6 | NaN |
7 | 10 |
In [126]:
df_c2.T #세로를 가로로 만들기
Out[126]:
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | |
---|---|---|---|---|---|---|---|---|
0 | 8000 | CHOI | CLERK | 7782.0 | 1995-05-31 | 2000 | NaN | 10 |
MERGE() - JOIN¶
- how : {'left','right','outer','inner'}--> {왼쪽기준outer,오른쪽기준outer,풀outer,이너[defalt]}
- 키가 동일하면 : on
- 좌 우 프레임의 조인할 키가 다를 경우 : left_on, right_on
In [159]:
# 기준열 이름이 같을 때
#pd.merge(left, right, on = '기준열', how = '조인방식')
# 기준열 이름이 다를 때
#pd.merge(left, right, left_on = '왼쪽 열', right_on = '오른쪽 열', how = '조인방식')
In [167]:
emp=df.copy()
emp
Out[167]:
empno | ename | job | mgr | hirddate | sal | comm | deptno | |
---|---|---|---|---|---|---|---|---|
0 | 7369 | SMITH | CLERK | 7902 | 1980-12-17 | 1800 | 0 | 30 |
1 | 7499 | ALLEN | SALESMAN | 7698 | 1981-02-20 | 1600 | 300 | 30 |
2 | 7521 | WARD | SALESMAN | 7698 | 1981-02-22 | 1250 | 500 | 30 |
3 | 7566 | JONES | MANAGER | 7839 | 1981-04-02 | 2975 | 0 | 20 |
4 | 7654 | MARTIN | SALESMAN | 7698 | 1981-09-28 | 1250 | 1400 | 30 |
5 | 7698 | BLAKE | MANAGER | 7839 | 1981-05-01 | 2850 | 0 | 30 |
6 | 7782 | CLARK | MANAGER | 7839 | 1981-06-09 | 2450 | 0 | 10 |
7 | 7788 | SCOTT | ANALYST | 7566 | 1987-07-13 | 3000 | 0 | 20 |
8 | 7839 | KING | PRESIDENT | 0 | 1981-11-17 | 5000 | 0 | 10 |
9 | 7844 | TURNER | SALESMAN | 7698 | 1981-09-08 | 1500 | 0 | 30 |
10 | 7876 | ADAMS | CLERK | 7788 | 1987-07-13 | 1100 | 0 | 20 |
11 | 7900 | JAMES | CLERK | 7698 | 1981-12-03 | 950 | 0 | 30 |
12 | 7902 | FORD | ANALYST | 7566 | 1981-12-03 | 6000 | 0 | 20 |
13 | 7934 | MILLER | CLERK | 7782 | 1982-01-23 | 1300 | 0 | 10 |
In [177]:
df_join = pd.merge(emp,emp[['empno','ename']], left_on='mgr',right_on='empno',how='left')
df_join
Out[177]:
empno_x | ename_x | job | mgr | hirddate | sal | comm | deptno | empno_y | ename_y | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 7369 | SMITH | CLERK | 7902 | 1980-12-17 | 1800 | 0 | 30 | 7902.0 | FORD |
1 | 7499 | ALLEN | SALESMAN | 7698 | 1981-02-20 | 1600 | 300 | 30 | 7698.0 | BLAKE |
2 | 7521 | WARD | SALESMAN | 7698 | 1981-02-22 | 1250 | 500 | 30 | 7698.0 | BLAKE |
3 | 7566 | JONES | MANAGER | 7839 | 1981-04-02 | 2975 | 0 | 20 | 7839.0 | KING |
4 | 7654 | MARTIN | SALESMAN | 7698 | 1981-09-28 | 1250 | 1400 | 30 | 7698.0 | BLAKE |
5 | 7698 | BLAKE | MANAGER | 7839 | 1981-05-01 | 2850 | 0 | 30 | 7839.0 | KING |
6 | 7782 | CLARK | MANAGER | 7839 | 1981-06-09 | 2450 | 0 | 10 | 7839.0 | KING |
7 | 7788 | SCOTT | ANALYST | 7566 | 1987-07-13 | 3000 | 0 | 20 | 7566.0 | JONES |
8 | 7839 | KING | PRESIDENT | 0 | 1981-11-17 | 5000 | 0 | 10 | NaN | NaN |
9 | 7844 | TURNER | SALESMAN | 7698 | 1981-09-08 | 1500 | 0 | 30 | 7698.0 | BLAKE |
10 | 7876 | ADAMS | CLERK | 7788 | 1987-07-13 | 1100 | 0 | 20 | 7788.0 | SCOTT |
11 | 7900 | JAMES | CLERK | 7698 | 1981-12-03 | 950 | 0 | 30 | 7698.0 | BLAKE |
12 | 7902 | FORD | ANALYST | 7566 | 1981-12-03 | 6000 | 0 | 20 | 7566.0 | JONES |
13 | 7934 | MILLER | CLERK | 7782 | 1982-01-23 | 1300 | 0 | 10 | 7782.0 | CLARK |
In [161]:
dept_list = [[10,'ACCOUNTING','NEW YORK'],[20,'RESEARCH','DALLAS'],[30,'SALES','CHICAGO'],[40,'OPERATIONS','BOSTON']]
dept_col = ['deptno','dname','loc']
dept_df = pd.DataFrame(data=dept_list, columns=dept_col)
dept_df
Out[161]:
deptno | dname | loc | |
---|---|---|---|
0 | 10 | ACCOUNTING | NEW YORK |
1 | 20 | RESEARCH | DALLAS |
2 | 30 | SALES | CHICAGO |
3 | 40 | OPERATIONS | BOSTON |
In [166]:
df_join2 = pd.merge(df,dept_df, on='deptno',how='right')
df_join2
Out[166]:
empno | ename | job | mgr | hirddate | sal | comm | deptno | dname | loc | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 7782.0 | CLARK | MANAGER | 7839.0 | 1981-06-09 | 2450.0 | 0.0 | 10 | ACCOUNTING | NEW YORK |
1 | 7839.0 | KING | PRESIDENT | 0.0 | 1981-11-17 | 5000.0 | 0.0 | 10 | ACCOUNTING | NEW YORK |
2 | 7934.0 | MILLER | CLERK | 7782.0 | 1982-01-23 | 1300.0 | 0.0 | 10 | ACCOUNTING | NEW YORK |
3 | 7566.0 | JONES | MANAGER | 7839.0 | 1981-04-02 | 2975.0 | 0.0 | 20 | RESEARCH | DALLAS |
4 | 7788.0 | SCOTT | ANALYST | 7566.0 | 1987-07-13 | 3000.0 | 0.0 | 20 | RESEARCH | DALLAS |
5 | 7876.0 | ADAMS | CLERK | 7788.0 | 1987-07-13 | 1100.0 | 0.0 | 20 | RESEARCH | DALLAS |
6 | 7902.0 | FORD | ANALYST | 7566.0 | 1981-12-03 | 6000.0 | 0.0 | 20 | RESEARCH | DALLAS |
7 | 7369.0 | SMITH | CLERK | 7902.0 | 1980-12-17 | 1800.0 | 0.0 | 30 | SALES | CHICAGO |
8 | 7499.0 | ALLEN | SALESMAN | 7698.0 | 1981-02-20 | 1600.0 | 300.0 | 30 | SALES | CHICAGO |
9 | 7521.0 | WARD | SALESMAN | 7698.0 | 1981-02-22 | 1250.0 | 500.0 | 30 | SALES | CHICAGO |
10 | 7654.0 | MARTIN | SALESMAN | 7698.0 | 1981-09-28 | 1250.0 | 1400.0 | 30 | SALES | CHICAGO |
11 | 7698.0 | BLAKE | MANAGER | 7839.0 | 1981-05-01 | 2850.0 | 0.0 | 30 | SALES | CHICAGO |
12 | 7844.0 | TURNER | SALESMAN | 7698.0 | 1981-09-08 | 1500.0 | 0.0 | 30 | SALES | CHICAGO |
13 | 7900.0 | JAMES | CLERK | 7698.0 | 1981-12-03 | 950.0 | 0.0 | 30 | SALES | CHICAGO |
14 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 40 | OPERATIONS | BOSTON |
In [175]:
emp2 = emp.rename(columns={"deptno":"DNO"})
emp2
Out[175]:
empno | ename | job | mgr | hirddate | sal | comm | DNO | |
---|---|---|---|---|---|---|---|---|
0 | 7369 | SMITH | CLERK | 7902 | 1980-12-17 | 1800 | 0 | 30 |
1 | 7499 | ALLEN | SALESMAN | 7698 | 1981-02-20 | 1600 | 300 | 30 |
2 | 7521 | WARD | SALESMAN | 7698 | 1981-02-22 | 1250 | 500 | 30 |
3 | 7566 | JONES | MANAGER | 7839 | 1981-04-02 | 2975 | 0 | 20 |
4 | 7654 | MARTIN | SALESMAN | 7698 | 1981-09-28 | 1250 | 1400 | 30 |
5 | 7698 | BLAKE | MANAGER | 7839 | 1981-05-01 | 2850 | 0 | 30 |
6 | 7782 | CLARK | MANAGER | 7839 | 1981-06-09 | 2450 | 0 | 10 |
7 | 7788 | SCOTT | ANALYST | 7566 | 1987-07-13 | 3000 | 0 | 20 |
8 | 7839 | KING | PRESIDENT | 0 | 1981-11-17 | 5000 | 0 | 10 |
9 | 7844 | TURNER | SALESMAN | 7698 | 1981-09-08 | 1500 | 0 | 30 |
10 | 7876 | ADAMS | CLERK | 7788 | 1987-07-13 | 1100 | 0 | 20 |
11 | 7900 | JAMES | CLERK | 7698 | 1981-12-03 | 950 | 0 | 30 |
12 | 7902 | FORD | ANALYST | 7566 | 1981-12-03 | 6000 | 0 | 20 |
13 | 7934 | MILLER | CLERK | 7782 | 1982-01-23 | 1300 | 0 | 10 |
In [179]:
emp_dept2 = pd.merge(emp2, dept_df, how="inner", left_on="DNO", right_on='deptno')
emp_dept2
Out[179]:
empno | ename | job | mgr | hirddate | sal | comm | DNO | deptno | dname | loc | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 7369 | SMITH | CLERK | 7902 | 1980-12-17 | 1800 | 0 | 30 | 30 | SALES | CHICAGO |
1 | 7499 | ALLEN | SALESMAN | 7698 | 1981-02-20 | 1600 | 300 | 30 | 30 | SALES | CHICAGO |
2 | 7521 | WARD | SALESMAN | 7698 | 1981-02-22 | 1250 | 500 | 30 | 30 | SALES | CHICAGO |
3 | 7654 | MARTIN | SALESMAN | 7698 | 1981-09-28 | 1250 | 1400 | 30 | 30 | SALES | CHICAGO |
4 | 7698 | BLAKE | MANAGER | 7839 | 1981-05-01 | 2850 | 0 | 30 | 30 | SALES | CHICAGO |
5 | 7844 | TURNER | SALESMAN | 7698 | 1981-09-08 | 1500 | 0 | 30 | 30 | SALES | CHICAGO |
6 | 7900 | JAMES | CLERK | 7698 | 1981-12-03 | 950 | 0 | 30 | 30 | SALES | CHICAGO |
7 | 7566 | JONES | MANAGER | 7839 | 1981-04-02 | 2975 | 0 | 20 | 20 | RESEARCH | DALLAS |
8 | 7788 | SCOTT | ANALYST | 7566 | 1987-07-13 | 3000 | 0 | 20 | 20 | RESEARCH | DALLAS |
9 | 7876 | ADAMS | CLERK | 7788 | 1987-07-13 | 1100 | 0 | 20 | 20 | RESEARCH | DALLAS |
10 | 7902 | FORD | ANALYST | 7566 | 1981-12-03 | 6000 | 0 | 20 | 20 | RESEARCH | DALLAS |
11 | 7782 | CLARK | MANAGER | 7839 | 1981-06-09 | 2450 | 0 | 10 | 10 | ACCOUNTING | NEW YORK |
12 | 7839 | KING | PRESIDENT | 0 | 1981-11-17 | 5000 | 0 | 10 | 10 | ACCOUNTING | NEW YORK |
13 | 7934 | MILLER | CLERK | 7782 | 1982-01-23 | 1300 | 0 | 10 | 10 | ACCOUNTING | NEW YORK |
'Python > 데이터 분석' 카테고리의 다른 글
(데이터 분석) Pandas SQL처럼 활용하기 -정리 중- (0) | 2022.01.10 |
---|---|
(데이터 분석) 파이썬 - Pandas DataFrame SQL처럼 활용하기(CRUD) (0) | 2022.01.07 |
(데이터 분석)파이썬 - Pandas_dataframe (0) | 2022.01.06 |
(데이터 분석)파이썬 - Pandas DataFrame, Numpy array (0) | 2022.01.05 |
(데이터 분석) Pandas 가이드북 링크 (0) | 2022.01.05 |
댓글