Innovenergy_trunk/S3ExtractingTool/ExtractS3README.txt

91 lines
3.9 KiB
Plaintext

This README file provides a comprehensive guide to utilizing a Python script for interacting with S3 storage,
specifically designed for downloading and processing data files based on a specified time range and key parameters.
The script requires Python3 installed on your system and makes use of the s3cmd tool for accessing data in cloud storage.
It also illustrates the process of configuring s3cmd by creating a .s3cfg file with your access credentials. Nice
############ Create the .s3cfg file in home directory ################
nano .s3cfg
Copy this lines inside the file.
[default]
host_base = sos-ch-dk-2.exo.io
host_bucket = %(bucket)s.sos-ch-dk-2.exo.io
access_key = EXO4d838d1360ba9fb7d51648b0
secret_key = _bmrp6ewWAvNwdAQoeJuC-9y02Lsx7NV6zD-WjljzCU
use_https = True
############ S3cmd instalation ################
Please install s3cmd for retrieving data from our Cloud storage.
sudo apt install s3cmd
############ Python3 instalation ################
To check if you have already have python3, run this command
python3 --version
To install you can use this command:
1) sudo apt update
2) sudo apt install python3
3) python3 --version (to check if pyhton3 installed correctly)
############ Run extractS3data.py ################
usage: extractRange.py [-h] --key KEY --bucket-number BUCKET_NUMBER start_timestamp end_timestamp
KEY: the key can be a one word or a path
for example: /DcDc/Devices/2/Status/Dc/Battery/voltage ==> this will provide us a Dc battery Voltage of the DcDc device 2.
example : Dc/Battery/voltage ==> This will provide all DcDc Device voltage (including the avg voltage of all DcDc device)
example : voltage ==> This will provide all voltage of all devices in the Salimax
BUCKET_NUMBER: This a number of bucket name for the instalation
start_timestamp end_timestamp: this must be a correct timestamp of 10 digits.
The start_timestamp must be smaller than the end_timestamp.
PS: The data will be downloaded to a folder named S3cmdData_{Bucket_Number}. If this folder does not exist, it will be created.
If the folder exist, it will try to download data between the requested timestamps if they files are not already existing.
Example command:
python3 extractS3data.py 1749062721 1749106001 --keys GridMeter/Ac/Power/Active --bucket-number 12 --product_name=SodistoreMax
################################ EXTENDED FEATURES FOR MORE ADVANCED USAGE ################################
1) Multiple Keys Support:
The script supports the extraction of data using multiple keys. Users can specify one or multiple keys separated by commas with the --keys parameter.
This feature allows for more granular data extraction, catering to diverse data analysis requirements. For example, users can extract data for different
metrics or parameters from the same or different CSV files within the specified range.
3) Dynamic Header Generation:
The script dynamically generates headers for the output CSV file based on the keys provided. This ensures that the output file accurately reflects the
extracted data, providing a clear and understandable format for subsequent analysis. The headers correspond to the keys used for data extraction, making
it easy to identify and analyze the extracted data.
4)Advanced Data Processing Capabilities:
Booleans as Numbers: The --booleans_as_numbers flag allows users to convert boolean values (True/False) into numeric representations (1/0). This feature
is particularly useful for analytical tasks that require numerical data processing.
Example Command:
python3 extractS3data.py 1749062721 1749106001 --keys AcDc/SystemControl/ResetAlarmsAndWarnings,AcDc/Devices/1/Status/Ac/L1/Voltage --bucket-number 12 --product_name=SodistoreMax
This command extracts data for AcDc/SystemControl/ResetAlarmsAndWarnings and AcDc/Devices/1/Status/Ac/L1/Voltage keys from bucket number 12, between the specified timestamps, with boolean values converted to numbers.