[Python/파이썬] LSTM pyupbit tensorflow keras sklearn - 2. LSTM 모델을 활용한 BITCOIN 예측

 1. BITCOIN Dataset 불러오기 - pyupbit 사용

# 패키지선언
import os
import pyupbit as py
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib
import matplotlib.pyplot as plt
py.get_tickers(fiat='USD')
Output
['USDT-BTC',
 'USDT-ETH',
 'USDT-XRP',
 'USDT-ETC',
 'USDT-OMG',
 'USDT-ADA',
 'USDT-TUSD',
 'USDT-SC',
 'USDT-TRX',
 'USDT-BCH',
 'USDT-DGB',
 'USDT-DOGE',
 'USDT-ZRX',
 'USDT-RVN',
 'USDT-BAT']

py.get_current_price(['USDT-BTC','USDT-ETH'])
Output
 
 {'USDT-BTC': 20878.37330684, 'USDT-ETH': 1225.49151905}


tickers = ['USDT-BTC','USDT-ETH','USDT-XRP','USDT-ADA','USDT-LTC']
interval = 'minute60'
from tqdm import tqdm

coin_set = []
for ticker in tqdm(tickers):
    coin = py.get_ohlcv(ticker=ticker,count=20000,interval=interval,to='2022-01-01')
    coin_set.append(coin)
BTC = coin_set[0]
ETH = coin_set[1]
XRP = coin_set[2]
ADA = coin_set[3]
LTC = coin_set[4]
# BITCOIN 거래가 가져오기
BTC
Output
openhighlowclosevolumevalue
2019-09-06 21:00:00+00:0010837.07345010892.00000010825.00000010826.24000013.028313141488.509446
2019-09-06 22:00:00+00:0010826.24000010851.99000010763.96851710803.42269026.329976284421.274696
2019-09-06 23:00:00+00:0010803.41000010850.00000010803.41000010827.64999511.192036121318.460494
2019-09-07 00:00:00+00:0010833.52000010879.27000010833.52000010843.1600007.25786178828.225009
2019-09-07 01:00:00+00:0010841.00610510856.28237610827.64999510855.1771975.16066755951.010868
.....................
2022-05-01 04:00:00+00:0038366.85000038531.87167938366.85000038531.8716790.017207661.545067
2022-05-01 05:00:00+00:0038453.59900038453.59900038379.20000038379.2000000.019895764.321204
2022-05-01 06:00:00+00:0038379.20000038445.96100038332.26800038382.9310000.1537095905.244381
2022-05-01 07:00:00+00:0038301.41400038422.00680038259.73500038399.0640000.52072219982.987904
2022-05-01 08:00:00+00:0038370.90600038370.90600037654.17200037666.7100002.58353297615.410157

# 날짜 지정 및 (시간대 별) 종가 추출
BTC = BTC[BTC.index >= '2020-01-01']
BTC = BTC[BTC.index < '2022-01-01']
BTC = BTC[['close']]
BTC
Output
close
2020-01-01 00:00:00+00:007385.000
2020-01-01 02:00:00+00:007385.000
2020-01-01 03:00:00+00:007355.000
2020-01-01 06:00:00+00:007440.000
2020-01-01 10:00:00+00:007420.000
......
2021-12-31 19:00:00+00:0048013.895
2021-12-31 20:00:00+00:0048009.388
2021-12-31 21:00:00+00:0048067.146
2021-12-31 22:00:00+00:0048014.406
2021-12-31 23:00:00+00:0048130.977


2. Data Scaling
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()
# 스케일을 적용할 column을 정의합니다.
scale_col = ['Close']
# 스케일 후 columns
BTC_scaled = scaler.fit_transform(BTC)
# DataFrame
BTC_scaled = pd.DataFrame(BTC_scaled)
BTC_scaled.columns = scale_col
BTC_scaled
Output
Close
00.045203
10.045203
20.044733
30.046064
40.045751
......
148610.681685
148620.681614
148630.682519
148640.681693
148650.683519


3. Train set / Test set 분할
from sklearn.model_selection import train_test_split
train = BTC_scaled[:-72]
test = BTC_scaled[-72:]
print(train.shape,test.shape)
Output

 (14794, 1) (72, 1)

X_train=train[:-24]
y_train=train[24:]
print(X_train.shape,y_train.shape)
Output

 (14770, 1) (14770, 1)

X_test=test[:-24]
y_test=test[24:]
print(X_test.shape,y_test.shape)
Output

 (48, 1) (48, 1)

# window_size=24 : BITCOIN 1일 영업시간(00:00~24:00) 동안의 dataset
def make_dataset(data,label,window_size=24):
    feature_list=[]
    label_list=[]
    for i in range(len(data)-window_size):
        feature_list.append(np.array(data[i:i+window_size]))
        label_list.append(np.array(label.iloc[i]))
    return np.array(feature_list), np.array(label_list)
# train dataset
X_train, y_train = make_dataset(X_train,y_train,24)

# train, validation set 생성
X_train, X_val, y_train, y_val = train_test_split(X_train,y_train, test_size = 0.2)

# test dataset
X_test,y_test = make_dataset(X_test,y_test,24)
X_test.shape,y_test.shape
Output

 ((24, 24, 1), (24, 1))


Test Set의 경우, 총 24시간의 데이터가 가운데 차원을 구성,

예측 값의 종류가 Close 한 개임에 따라 Feature의 개수는 한 개로 설정


4. Modeling - LSTM Model

import tensorflow as tf
from tensorflow import keras
from keras.models import Sequential
from keras.layers import Dense
from keras.callbacks import EarlyStopping, ModelCheckpoint,ReduceLROnPlateau
from keras.layers import LSTM

earlystopping = EarlyStopping(patience=10,verbose=1)
checkpoint = ModelCheckpoint('model.h5', monitor='val_loss', verbose=1, save_best_only=True, mode='auto')

model=tf.keras.models.Sequential([
    tf.keras.layers.LSTM(192,return_sequences=True,input_shape=(X_train.shape[1],1)),
    tf.keras.layers.LSTM(96,return_sequences=False),
    tf.keras.layers.Dense(48),
    tf.keras.layers.Dense(1)
])
model.compile(loss='mean_squared_error', optimizer='adam')
hist=model.fit(X_train,y_train,epochs=100,batch_size=5,validation_data=(X_val,y_val),callbacks=[earlystopping])


5. Valid Loss plot 생성

import matplotlib.pyplot as plt

str_plt_style = 'bmh'
plt.style.use([str_plt_style])
plt.rcParams["figure.figsize"] = (10,6) 
plt.rcParams["font.size"]=11

plt.title('Valid Loss')
plt.plot(hist.history['loss'],label='train loss')
plt.plot(hist.history['val_loss'],label='valid loss')
plt.legend()
plt.show()

2차 프로젝트 - Valid Loss plot (BITCOIN)


6. BITCOIN 시간대 별 거래가 예측 plot 생성

y_pred=model.predict(X_test)

# 원래 값으로 되돌리기
y_pred = scaler.inverse_transform(y_pred)
y_test = scaler.inverse_transform(y_test)
str_plt_style = 'bmh'
plt.style.use([str_plt_style])
plt.rcParams["figure.figsize"] = (16,9) 
plt.rcParams["font.size"]=11

plt.title('BITCOIN')
plt.plot(y_test,label='actual')
plt.plot(y_pred,label='prediction')
plt.legend()
plt.show()

2차 프로젝트 - BITCOIN plot


7. Shiftin' plot 생성

result = pd.DataFrame(index=test.index[14:])
result.reset_index(inplace=True)
 
result['y_pred'] = y_pred
str_plt_style = 'bmh'
plt.style.use([str_plt_style])
plt.rcParams["figure.figsize"] = (16,9) 
plt.rcParams["font.size"]=11

plt.title('S&P500')
plt.plot(y_test,label='actual')

shift = result.y_pred.shift(-1).values
plt.plot(shift,label="shiftin'",color='orchid')
plt.legend()
plt.show()

2차 프로젝트 - BITCOIN Shftin' plot

훈련 결과, BITCOIN 역시 예측 값이 실제 값에 대해 shifting되는 경향을 보임

Network가 Test Data를 mimicking하는 것으로 추측

(Shifthin' plot을 통해 추세선의 유사성 확인 가능)

➟ Multi-Step Forecast 기법을 활용한 Seq2Seq 모델로 문제 해결 시도 예정


8. 정확도 측정

from sklearn.metrics import mean_squared_error 

# RMSE
MSE = mean_squared_error(y_test, y_pred) 
RMSE = np.sqrt(MSE)

# MAPE
def MAPE(y_test, y_pred):
    return np.mean(np.abs((y_test - y_pred) / y_test)) * 100 
print('RMSE =',round(RMSE,2))
print('MAPE =',round(MAPE(y_test, y_pred),2),'%')
Output

 RMSE = 730.03
 MAPE = 1.44 %



댓글

이 블로그의 인기 게시물

[Python/파이썬] LSTM FinanceDataReader tensorflow keras sklearn - 1. LSTM 모델을 활용한 S&P500 예측

[Python/파이썬] Numpy Pandas Matplotlib Seaborn Sklearn - 2. 신용등급 Grouped Barplot