Reputation: 47
I am trying to execute a pandas script on my excel using Apache Nifi.
I successfully use GetFile to get my excel. Then I want to run a simple script on it (made up for testing purposes):
#Import necessary modules
import pandas as pd
#Import data from excel files
table_EVS = pd.read_excel("path", sheet_name="1")
#Final_table
table_EVS.to_csv(output)
I am trying to use ExecuteScript or ExecuteStreamCommand and passing this script as py file:
#Import necessary modules
import pandas as pd
import sys
#Import data from excel files
table_EVS = pd.read_excel(sys.stdin)
#Final_table
table_EVS.to_csv(sys.stdout, index=False)
But it is not successful. Any ideas?
To clarify, my goal is not to convert to csv but to use a python pandas script and run it successfully against any kind of file (provided my pandas code can do it) with inside of Nifi. Am I better off doing this in Apache Airflow?
Upvotes: 0
Views: 974
Reputation: 589
You can do this via NiFi. Here are some example scripts that may help point you in the right direction: https://github.com/sucitw/python-script-in-NiFi https://www.nifi.rocks/using-the-executescript-processor/
Upvotes: 2