Reputation: 6050
I am trying to develop a simple classification program using scikit-learn. I want to pull in my set of tsv values, save them in an array. Then, save a csv containing the first value of my tsv from above and simply a random 1 or 0. So it will be output to the csv as follows:
tsvValue1, random1or0 eg
string123, 0
foo234, 1
I have all the code (nearly) separately, my problem is fitting it all together.
import numpy as np
from sklearn import metrics,preprocessing,cross_validation
import pandas as p
loadData = lambda f: np.genfromtxt(open(f,'r'), delimiter=' ')
def main():
traindata = list(np.array(p.read_table('../data/train.tsv'))[:,2])
testdata = list(np.array(p.read_table('../data/test.tsv'))[:,2])
y = np.array(p.read_table('../data/train.tsv'))[:,-1]
X_all = traindata + testdata
# What can I do below? What can I use to export to csv
# properly with an appended 1 or 0 value below ?
from random import randint
randomInt = randint(0,1) #Inclusive
testfile = p.read_csv(
'../data/test.tsv', sep="\t", na_values=['?'], index_col=1)
pred_df = p.DataFrame(testdata, index=testfile.index, columns=['label'])
pred_df.to_csv('test.csv')
print ("your random file has been created..")
if __name__=="__main__":
main()
UPDATE : Standard format of input tsv file:
foo1 foo2 foo3 foo4 fooN
RelevantString123123123
RelevantString456456456
RelevantString789789789
Format of desired resulting csv:
RelevantString123123123,1
RelevantString456456456,0
RelevantString789789789,1
The second 1 or 0 in the csv file being ranzomly generated.
Upvotes: 0
Views: 849
Reputation: 6710
Having the file input.tsv
with the content (separated by tabs):
foo1 foo2 foo3 foo4 fooN
RelevantString123123123
RelevantString456456456
RelevantString789789789
This shows how to get the output you want:
>>> import numpy as np
>>> import pandas
>>> df = pandas.read_csv('input.tsv', sep='\t')
>>> df['value'] = pandas.Series(np.random.randint(2, size=len(df)), index=df.index)
>>> df.to_csv('output.csv', cols=['foo1', 'value'], index=False)
The output.csv
content is:
foo1,value
RelevantString123123123,1
RelevantString456456456,0
RelevantString789789789,0
Upvotes: 1