Python version: 3.6.5.
Download from: GDrive(24MB)
Please read below the project specifics:
1. Project objective - develop a system that effectively extracts data from csv data files, transforms and exports aggregated data in predefined folder. The system at present is capable of the below:
- extract the data from one/multiple/all csv files in "data" folder
- extract selected variables predefined in a csv template stored in folder "templates"
- automatically detect whether the data in a given variable /variable name=name in first row of the csv file/ is of integer, float or string type.
- produce file reports with information about the data available in the file
- blank cells in the data are recorded as None
- execute custom Python scripts
Those are the core functionalities of the system, however, all of them will be described when explaining functions separately.
2. Structure and folders
2.1. Structure - my idea was to have a standalone system for data processing & management, that's why I created a terminal (console.exe) which is supposed to work independently, i.e. no matter if you have Python installed or not. The core files you need to run the system are stored in folder "main" and are as below:
console.py and SnakeData.py are not mandatory for running, however their code is essential and that's why I kept them in the zip file. meta.xml is essential for system functionality as it keeps the folder hierarchy & structure:
"db1" is a basic database structure with the minimum sub-folders for the system to work. They will be described in the next point.
2.2 Folders - after unzipping the file you will find one folder called "main". Consider this as a root folder of the entire system, i.e. the terminal/console will not work if the folder is not called main and all database structures should be created in this folder. The only sub-folder available at the moment is "db1" and it is an example database structure as mentioned above. The sub-folders in db1 are:
- data - as the name suggests it is holding the raw data files to be processed; it contains 4 csv files with stock price data downloaded from Yahoo Finance.
- exports - all exports should be stored here
- report - all file/variable reports should be stored here
- scripts - all executable scripts should be stored here
- templates - all data templates should be stored here
- meta.xml - it is not a folder but an xml map file holding the type of the folder and it's sub-folders:
3. Files
3.1 console.py - here is the for the terminal, will be reviewed in detail in separate post.
3.2 SnakeData.py - in fact this is the heart of the system and stores all Python functions I made so far. Each function will be reviewed in separate post.
Functions list:
var_load_xlsx(file,var_name)
single_code_filter(flt_list,flrd_list,id_list,flt_code)
var_load_csv(file,var_name,int_)
var_list_export(file,csv_,xlsx_)
range_checker(id_var,num_var,coded,flt_var,range_start,range_end)
report(var,report_name)
var_cluster(vars,strings)
cluster_rp(cluster,report_name)
cluster_to_numpy(cluster)
col_data(file)
file_report(file)
read_all_data(template)
read_custom_data(template,file_list)
select_db(folder)
cd(folder)
create_db(db_name)
export_to_csv(dict,file_name)
row_stats(dict)
col_stats(dict)
tab_view(dict,colwid)
create_template(temp_name,var_list)
sel_rows(dict,var1,cond,var2,colprint)
run_script(script_file)
Future upgrades:
Users - create user management system, i.e. 1 default admin user who can register/delete other users with lower level of rights; users details to be stored in encrypted "users.csv" /or other encrypted file type if impossible to encrypt csv files/.
Syntax errors - provide more detailed information about syntax errors from the console.
xlsx_to_csv() - develop a function to open given excel file and export all of its sheets into separate csv files.
scan_root() - develop a function to scan main folder which will print a list of all available databases.
scan_data() - develop a function to scan data folder in a selected database which will print a list of all available data files.
remove_db() - develop a function to remove a database of choice.
rm_key() - develop a function to clear some or all of the keys /dictionary keys act as columns/ in a dictionary.
Няма коментари:
Публикуване на коментар