cluster.data package

Submodules

cluster.data.data_node module

class cluster.data.data_node.DataNode[source]

Bases: cluster.common.common_node.WorkFlowCommonNode

load_data(node_id, parm='all')[source]
multi_load_data(node_id, parm='all')[source]
run(conf_data)[source]

cluster.data.data_node_frame module

class cluster.data.data_node_frame.DataNodeFrame[source]

Bases: cluster.data.data_node.DataNode

DataNode Configuration NULL처리가 중요한데 Category “” Continuous 0.0

check_eval_node_for_wdnn(_conf_data)[source]
Eval Data의 Category 데이터를 가져오기 위해서 필요
WDNN이면 data_conf_node_id를 반환
Args:
params:
  • _conf_data : nnid의 wf정보
Returns:
data_conf_node_id DataConf의 ID반환
create_example_pandas(row, CONTINUOUS_COLUMNS, CATEGORICAL_COLUMNS, label, label_type)[source]
Converting tfrecord example from pandas
Pandas Dataframe을 tfrecord로 변경하는 함수(WDNN용)
Args:
params:
  • row : Dataframe row
  • CONTINUOUS_COLUMNS
  • CATEGORICAL_COLUMNS
  • label
  • label_type
Returns:
tfrecord example

Raises:

create_hdf5(data_path, dataframe)[source]

Create hdf5 :param data_path: :return:dataframe

create_tfrecords_file(output_file, skip_header, df_csv_read, label, label_type)[source]

Creates a TFRecords file for the given input data and example transofmration function

dataconf_eval_time_check(_wf_data_conf_node, _node_name)[source]

data conf가 있어도, eval이면 unique값만 추가한다. :param data_dfconf_list (nn00001_1_dataconf_node) :return True:

dataconf_first_time_check(_wf_data_conf_node, _node_name)[source]

data_conf가 비어있거나, DataNode일때만 업데이트 하도록 한다. :param data_dfconf_list (nn00001_1_dataconf_node) :return True:

get_eval_node_file_list(conf_data)[source]
Eval Data Node 찾고, 경로를 찾아서 CSV를 읽음
self.data_conf에 cell_feature에 넣음
Args:
params:
  • _conf_data : nnid의 wf정보
Returns:
None
load_csv_by_pandas(data_path)[source]

read csv :param data_path: :return:data_path

load_data(node_id='', parm='all')[source]

load train data :param node_id: :param parm: :return:

make_column_types(df, node_id, data_dfconf_list)[source]

csv를 읽고 column type을 계산하여 data_conf에 저장(data_conf가 비어있을때 ) :param df: :param conf_data:

make_continuous_category_list(cell_feature)[source]

Example 을 위한 Continuous 랑 Categorical을 구분하기 위한 list

make_drop_duplicate(_df_csv_read_ori, _drop_duplicate, _label)[source]

Label을 제외한 나머지 값중에 중복이 있으면 Row 전체를 제거한다. Args:

params:
  • _preprocessing_type: [‘scale’, ‘minmax_scale’, ‘robust_scale’, ‘normalize’, ‘maxabs_scale’]
  • _df_csv_read_ori : pandas dataframe
  • _label
Returns:
Preprocessing Dataframe
make_label_values(_data_dfconf_list, _df_csv_read)[source]

label의 Unique Value를 DataConf에 넣어줌 Args:

params:
  • _data_dfconf_list : nnid의 wf정보
  • _df_csv_read : Dataframe(train, eval)
Returns:
_label : label 항목 값 _labe_type : label type
make_preprocessing_pandas(_df_csv_read_ori, _preprocessing_type, _label)[source]
SKLearn을 사용해서 Pandas를 Proprocessing
label은 Preprocessing 하면 안됨
Args:
params:
  • _preprocessing_type: [‘scale’, ‘minmax_scale’, ‘robust_scale’, ‘normalize’, ‘maxabs_scale’]
  • _df_csv_read_ori : pandas dataframe
  • _label
Returns:
Preprocessing DataFrame
make_unique_value_each_column(df, node_id)[source]
Dataframe중 범주형 데이터를 찾아서 유일한 값의 갯수를 반환한다
Unique Value return in Dataframe
Args:
params:
  • df : dataframe
  • node_id: nnid
Returns:
json

Raises:

multi_load_data(node_id, parm='all')[source]
preprocess_data(input_data)[source]
Parameters:input_data
Returns:
run(conf_data)[source]

Run Data Node 한번에 HDF5랑 TFRECORD를 만든다. :param data_path: :return:dataframe

save_tfrecord(csv_data_file, store_path, skip_header, df_csv_read, label, label_type)[source]

Creates a TFRecords file for the given input data and example transofmration function

set_dataconf_for_checktype(df, node_id, data_dfconf_list)[source]

csv를 읽고 column type을 계산하여 data_conf에 저장(data_conf가 비어있을때 ) 카테고리 컬럼은 Unique 한 값을 구해서 cell_feature_unique에 넣어줌(Keras용)

Parameters:
  • df, nnid, ver, node (wf_data_config,) –
  • conf_data
set_dataconf_for_labels(df, label)[source]

csv를 읽고 label의 distict 값을 가져옴 Extract distinct label values :param wf_data_config, df, nnid, ver, node: :param conf_data:

set_default_dataconf_from_csv(wf_data_config, node_id, data_conf)[source]
Parameters:
  • df, nnid, ver, node (wf_data_config,) –
  • conf_data

tfrecord 때문에 항상 타입을 체크하고 필요할때만 저장

src_local_handler(conf_data)[source]
Converting csv to h5 and Tf Record

Data Node for Data_frame 1) Wdnn인 경우

Pandas를 파싱하면서 Categorical 인지 Continuous인지 구별하여 DataConf에 입력(eval data할때는 안함. DataNode 기준 ) Category일경우 Unique값을 Dataconf에 입력 Label type이 Categorical이면 Label의 Unique값을 DataConf입력 _preprocess_type에 따라 Pandas 전처리
  1. _multi_node_flag 가 True일 경우 TfRecord까지 생성
  2. Wdnn이 아닌경우 H5만 생성
Args:
params:
  • conf_data : nn_info
Returns:
None

Raises:

cluster.data.data_node_image module

class cluster.data.data_node_image.DataNodeImage[source]

Bases: cluster.data.data_node.DataNode

download_file_from_google_drive(URL, destination)[source]
get_confirm_token(response)[source]
image_convert(sess, dataconf, img, filename, forder=None)[source]
load_data(node_id='', parm='all')[source]
process_predicts(predicts)[source]
run(conf_data)[source]
save_response_content(response, destination)[source]
yolo_detection()[source]

cluster.data.data_node_iob module

class cluster.data.data_node_iob.DataNodeIob[source]

Bases: cluster.data.data_node.DataNode, cluster.common.neural_common_bilismcrf.BiLstmCommon

load_data(node_id='', parm='all')[source]

load train data :param node_id: :param parm: :return:

run(conf_data)[source]

run on train time data node collect data from source, preprocess data and sotre it on NAS :param conf_data: :return:

src_local_handler(conf_data)[source]

read data from local file system :param conf_data: :return:

cluster.data.data_node_raw module

class cluster.data.data_node_raw.DataNodeRaw[source]

Bases: cluster.data.data_node.DataNode

load_data(node_id='', parm='all')[source]

load train data :param node_id: :param parm: :return:

run(conf_data)[source]
Parameters:conf_data
Returns:
src_local_handler(conf_data)[source]
Parameters:conf_data
Returns:

cluster.data.data_node_text module

class cluster.data.data_node_text.DataNodeText[source]

Bases: cluster.data.data_node.DataNode

load_data(node_id='', parm='all')[source]

load train data :param node_id: :param parm: :return:

run(conf_data)[source]
Parameters:conf_data
Returns:
src_local_handler(conf_data)[source]
Parameters:conf_data
Returns:

Module contents