Reputation: 68
Currently I am setting the pandas dataframe into a csv and loading it as weka dataset from CSV loader . Is there a mechanism to to directly load pandas dataframe into weka dataset without creating a intermediate CSV file in between
learn_df = pd.DataFrame.from_records([s.to_dict() for s in learnList])
header = ["reviewId","word","type","positive_sentiment","negative_sentiment","number_of_noun","sentence","hasNeg","overallSentiment","sentiment"]
learn_df.to_csv(helper.get_data_dir() + os.sep + "resultTest.csv", index=None, header=True,columns=header)
diabetes_file = helper.get_data_dir() + os.sep + "resultTest.csv"
helper.print_info("Loading dataset: " + diabetes_file)
loader = Loader("weka.core.converters.CSVLoader")
diabetes_data = loader.load_file(diabetes_file)
remove = Filter(classname="weka.filters.unsupervised.attribute.Remove", options=["-R", "1,2,7"])
remove.inputformat(diabetes_data)
filtered = remove.filter(diabetes_data)
//code to classify instances here
Each time converting to csv and loading from csv to classify makes it a costly process . IS there a mechanism to avoid this ?
Upvotes: 1
Views: 1071
Reputation: 142
@Manish You can either convert the pandas dataframe into a list or a numpy matrix and then use the weka methods create_instances_from_lists() and create_instances_from_matrices().
For more details you can look into the weka examples at http://fracpete.github.io/python-weka-wrapper/examples.html
Regarding the setting of last column to nominal
type instead of numeric
, as mentioned in the comments by @Pedro Pablo Severin Honorato, you can use weka filters.
An example for the same is as under:
from weka.filters import Filter
num_to_nom = Filter(classname="weka.filters.unsupervised.attribute.StringToNominal", options=["-R", "last"])
num_to_nom.inputformat(data) #data is the weka dataset whose last column is numeric.
newData=num_to_nom.filter(data) #newData is the weka dataset whose last column is nominal.
Hope this helps!
Upvotes: 3