방법1) python 리스트 [] 에 데이터를 담아서 만들기¶
방법2) python np.array(list)에 데이터를 담아서 만들기¶
방법3) python dict{k:v}에 데이터를 담아서 만들기¶
방법4) 파일(csv,txt,excel,html,hdf..) 데이터를 담아서 만들기¶
데이터프레임 핸들링¶
- loc , iloc¶
- 1줄로 임의로 고정시켜서 칸(컬럼) 제어¶
  - 'Python > 데이터 분석' 카테고리의 다른 글

728x90

pandas DataFrame¶

설치
pandas 데이터(파일/DB) 기본 구조
X번째(인덱스) 1:4번째(슬라이싱)
reshape (1,8)--(8,1)
결측
categorical : 사과1 바나나 2
그룹
시각화,시간....

In [2]:

import pandas as pd   #pandas.py
import numpy as np    #numpy.py

방법1) python 리스트 [] 에 데이터를 담아서 만들기¶

In [57]:

# data=None,
# index: Axes | None = None,
# columns: Axes | None = None,
# dtype: Dtype | None = None,

data_list = [[1, 'kim', 111],
             [2, 'lee', 222] 
            ]
col_list = ['seq','id','pw']
df = pd.DataFrame(data   = data_list,
                  columns = col_list,
                  index = ['a','b']
                 )   #DataFrame.class
print(df.info())  # desc df

<class 'pandas.core.frame.DataFrame'>
Index: 2 entries, a to b
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   seq     2 non-null      int64 
 1   id      2 non-null      object
 2   pw      2 non-null      int64 
dtypes: int64(2), object(1)
memory usage: 64.0+ bytes
None

In [58]:

df.head()   #5개만 보여줘

Out[58]:

	seq	id	pw
a	1	kim	111
b	2	lee	222

In [60]:

df.loc['a']

Out[60]:

seq      1
id     kim
pw     111
Name: a, dtype: object

In [61]:

df.iloc[0]

Out[61]:

seq      1
id     kim
pw     111
Name: a, dtype: object

In [65]:

df[ 0:1 ]

Out[65]:

	seq	id	pw
a	1	kim	111

방법2) python np.array(list)에 데이터를 담아서 만들기¶

In [5]:

data_list = [[1, 'kim', 111],
             [2, 'lee', 222] 
            ]
data_arr = np.array(data_list)
col_list = ['seq','id','pw']
df = pd.DataFrame(data    = data_arr,  #-------------리스트에서 배열로 변경
                  columns = col_list
                 )   #DataFrame.class
print(df.info())  # desc df

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   seq     2 non-null      object
 1   id      2 non-null      object
 2   pw      2 non-null      object
dtypes: object(3)
memory usage: 176.0+ bytes
None

In [6]:

df.head()   #5개만 보여줘

Out[6]:

	seq	id	pw
0	1	kim	111
1	2	lee	222

방법3) python dict{k:v}에 데이터를 담아서 만들기¶

In [45]:

s = pd.Series({'국어': 100, '영어': 95, '수학': 90})
s

Out[45]:

국어    100
영어     95
수학     90
dtype: int64

In [48]:

list_dict = [{'국어': 100, '영어': 90, '수학': 80},
             {'국어': 50, '영어': 55, '수학': 57},
             {'국어': 20, '영어': 22, '수학': 27}]
df = pd.DataFrame(list_dict)
# print(df.info())
# df.head()

In [55]:

dict =  { '국어': [100,50,20],
          '영어': [90,55,22],
          '수학': [80,57,27]
         }
df = pd.DataFrame(dict)
df.columns = ['kor','eng','math']  #------------컬럼이름 변경
print(df.info())
df.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   kor     3 non-null      int64
 1   eng     3 non-null      int64
 2   math    3 non-null      int64
dtypes: int64(3)
memory usage: 200.0 bytes
None

Out[55]:

	kor	eng	math
0	100	90	80
1	50	55	57
2	20	22	27

In [56]:

#s.index,   s.values
df.index  , df.values  ,df.columns

Out[56]:

(RangeIndex(start=0, stop=3, step=1),
 array([[100,  90,  80],
        [ 50,  55,  57],
        [ 20,  22,  27]], dtype=int64),
 Index(['kor', 'eng', 'math'], dtype='object'))

In [ ]:

방법4) 파일(csv,txt,excel,html,hdf..) 데이터를 담아서 만들기¶

In [7]:

df = pd.read_csv('./book/data/total_sales_data.csv') #sep=','

In [8]:

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9 entries, 0 to 8
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   매장명     9 non-null      object
 1   제품종류    9 non-null      object
 2   모델명     9 non-null      object
 3   판매      9 non-null      int64 
 4   재고      9 non-null      int64 
dtypes: int64(2), object(3)
memory usage: 488.0+ bytes

In [9]:

df.head()

Out[9]:

	매장명	제품종류	모델명	판매	재고
0	A	스마트폰	S1	1	2
1	A	스마트폰	S2	2	5
2	A	TV	V1	3	5
3	B	스마트폰	S2	4	6
4	B	스마트폰	S1	5	8

In [10]:

df.tail(2)

Out[10]:

	매장명	제품종류	모델명	판매	재고
7	C	TV	V1	3	6
8	C	TV	V2	7	9

In [11]:

df.iloc[ [3,4] , [1,2] ]

Out[11]:

	제품종류	모델명
3	스마트폰	S2
4	스마트폰	S1

In [40]:

df[  ["재고","판매"]  ]

Out[40]:

	재고	판매
0	2	1
1	5	2
2	5	3
3	6	4
4	8	5
5	9	6
6	4	2
7	6	3
8	9	7

In [37]:

# res = a[ a > 3 ]
df[  df['재고'] == 6   ]

Out[37]:

	매장명	제품종류	모델명	판매	재고
3	B	스마트폰	S2	4	6
7	C	TV	V1	3	6

In [35]:

df[ ['판매','재고'] ].head(2)

Out[35]:

	판매	재고
0	1	2
1	2	5

In [36]:

df.iloc[  0:2  ,  [3,4] ]

Out[36]:

	판매	재고
0	1	2
1	2	5

In [26]:

df.재고

Out[26]:

0    2
1    5
2    5
3    6
4    8
5    9
6    4
7    6
8    9
Name: 재고, dtype: int64

데이터프레임 핸들링¶

In [114]:

df = pd.read_csv('./lec12_sample.txt', sep='\t')
df.index = ['A0', 'A1', 'A2', 'A3', 'A4' ]

In [68]:

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   col0    5 non-null      int64 
 1   col1    5 non-null      object
 2   col2    5 non-null      int64 
 3   col3    5 non-null      int64 
dtypes: int64(3), object(1)
memory usage: 288.0+ bytes

In [115]:

df.head()

Out[115]:

	col0	col1	col2	col3
A0	0	name0	0	0
A1	1	name1	10	1000
A2	2	name2	20	2000
A3	3	name3	30	3000
A4	4	name4	40	4000

In [73]:

arr = np.array([ [1,11,111,1111], 
                [2,22,222,2222], 
                [3,33,333,3333],
                [4,44,444,4444],
                [5,55,555,5555]
               ])
print(arr)

[[   1   11  111 1111]
 [   2   22  222 2222]
 [   3   33  333 3333]
 [   4   44  444 4444]
 [   5   55  555 5555]]

In [75]:

df.shape  , arr.shape

Out[75]:

((5, 4), (5, 4))

arr 핸들링

번째     : arr[ 2 ],

슬라이싱 : arr[ s:e ]
[번째]   : arr[ [1,2] ]

In [76]:

arr[ 0 ]

Out[76]:

array([   1,   11,  111, 1111])

In [79]:

arr[ 0:3 ]         #0,1,2줄

Out[79]:

array([[   1,   11,  111, 1111],
       [   2,   22,  222, 2222],
       [   3,   33,  333, 3333]])

In [152]:

arr[ [0,2] ]

Out[152]:

array([[   1,   11,  111, 1111],
       [   3,   33,  333, 3333]])


줄, 칸 핸들링  3가지 :   단일값 3,   슬라이싱 2:4,   때거지 [1,2,3]
arr[줄,  칸]

df [컬럼 또는 조건]
df.loc[줄,  칸] 
df.iloc[줄,  칸]

In [ ]:

[[   1   11  111 1111]
 [   2   22  222 2222]
 [   3   33  333 3333]
 [   4   44  444 4444]
 [   5   55  555 5555]]

In [153]:

arr[0 , 1 ]  # 방법1 단일값-----0번째줄에서 1번째칸 

Out[153]:

In [98]:

df[ 'col1' ]

Out[98]:

0    name0
1    name1
2    name2
3    name3
4    name4
Name: col1, dtype: object

In [99]:

df[ ['col1','col2'] ] #-------------------------99%

Out[99]:

	col1	col2
0	name0	0
1	name1	10
2	name2	20
3	name3	30
4	name4	40

loc , iloc¶

In [107]:

arr[  0, 0 ]
arr[  0:1, 0:2 ]
arr[  [0,1,2], [1,2,3] ]

Out[107]:

array([  11,  222, 3333])

In [155]:

df.iloc[0:1    ]  #---df.iloc[줄 ,  칸]  여기서는 줄만 다루겠다

Out[155]:

	col0	col1	col2	col3
A0	0	name0	0	0

In [163]:

df.iloc[0]          #0번째 줄 ------------- Series
df.iloc[0:2]        #0,1번째 줄
df.iloc[ [0,1,2]]   #0,1번째 줄

Out[163]:

	col0	col1	col2	col3
A0	0	name0	0	0
A1	1	name1	10	1000
A2	2	name2	20	2000

1줄로 임의로 고정시켜서 칸(컬럼) 제어¶

In [164]:

df.iloc[1, 1 ]    #
df.iloc[1, 1:4]   #
df.iloc[1, [1,2]] #

Out[164]:

col1    name1
col2       10
Name: A1, dtype: object

In [136]:

df.iloc[1, 1 ]    
df.iloc[1:3, 1:3 ]
df.iloc[[1,2,3], [1,2]] 
df.iloc[1, [1,2]] 
df.iloc[[1,2,3], 1:3] 

Out[136]:

	col1	col2
A1	name1	10
A2	name2	20
A3	name3	30

In [145]:

df.loc['A1'  , 'col1']
df.loc['A1'  , 'col1': 'col3'  ]
df.loc['A1'  , ['col1', 'col2']  ]
df.loc['A1':'A3'  , 'col1': 'col3'  ]
df.loc[   ['A1','A2','A3']  , 'col1': 'col3'  ]
df.loc[   ['A1','A2','A3']  ,['col1', 'col2']  ]

Out[145]:

	col1	col2
A1	name1	10
A2	name2	20
A3	name3	30

잊으세요 ....

df.iloc[줄 , 칸]   df.loc[줄 , 칸]
df[줄][칸]        df[칸][줄]

In [147]:

df[0:2][['col1', 'col2']]
df[['col1', 'col2']][0:2]

Out[147]:

	col1	col2
A0	name0	0
A1	name1	10

df['값']은 컬럼(열) 호출하기 위해 사용(df[행]구조)

df[]는 defalt가 컬럼이라서 행값을 입력해서 사용불가 (start:end 형식만 가능)

따라서 df[]는 구조가 달라서 array처럼 사용불가 (array[행,열]구조)

array 사용하듯이 df를 사용하고 싶으면 iloc, loc 함수사용

iloc은 index로 호출 (index=몇번째인지 [리스트와 비슷])

loc은 행 이름으로 호출 df.loc['행이름']

'Python > 데이터 분석' 카테고리의 다른 글

(데이터 분석) 파이썬 - Pandas DataFrame SQL처럼 활용하기(Join,Union) (0)	2022.01.10
(데이터 분석) 파이썬 - Pandas DataFrame SQL처럼 활용하기(CRUD) (0)	2022.01.07
(데이터 분석)파이썬 - Pandas DataFrame, Numpy array (0)	2022.01.05
(데이터 분석) Pandas 가이드북 링크 (0)	2022.01.05
(데이터 분석)파이썬- datetime (0)	2022.01.04

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

while True: learn()

(데이터 분석)파이썬 - Pandas_dataframe

pandas DataFrame¶

방법1) python 리스트 [] 에 데이터를 담아서 만들기¶

방법2) python np.array(list)에 데이터를 담아서 만들기¶

방법3) python dict{k:v}에 데이터를 담아서 만들기¶

방법4) 파일(csv,txt,excel,html,hdf..) 데이터를 담아서 만들기¶

데이터프레임 핸들링¶

loc , iloc¶

1줄로 임의로 고정시켜서 칸(컬럼) 제어¶

'Python > 데이터 분석' 카테고리의 다른 글

댓글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역

(데이터 분석)파이썬 - Pandas_dataframe

pandas DataFrame¶

방법1) python 리스트 [] 에 데이터를 담아서 만들기¶

방법2) python np.array(list)에 데이터를 담아서 만들기¶

방법3) python dict{k:v}에 데이터를 담아서 만들기¶

방법4) 파일(csv,txt,excel,html,hdf..) 데이터를 담아서 만들기¶

데이터프레임 핸들링¶

loc , iloc¶

1줄로 임의로 고정시켜서 칸(컬럼) 제어¶

'Python > 데이터 분석' 카테고리의 다른 글

관련글

댓글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역