Reputation: 2789
I have a CSV file in the following format:
Date,Time,Open,High,Low,Close,Volume
09/22/2003,00:00,1024.5,1025.25,1015.75,1022.0,720382.0
09/23/2003,00:00,1022.0,1035.5,1019.25,1022.0,22441.0
10/22/2003,00:00,1035.0,1036.75,1024.25,1024.5,663229.0
I would like to add 20 new columns to this file, the value of each new column is synthetically created by simply randomizing a set of numbers.
It would be something like this:
import pandas as pd
df = pd.read_csv('dataset.csv')
print(len(df))
input()
for i in range(len(df)):
#Data that already exist
date = df.values[i][0]
time = df.values[i][1]
open_value= df.values[i][2]
high_value=df.values[i][3]
low_value=df.values[i][4]
close_value=df.values[i][5]
volume=df.values[i][6]
#This is the new data
prediction_1=randrange(3)
prediction_2=randrange(3)
prediction_3=randrange(3)
prediction_4=randrange(3)
prediction_5=randrange(3)
prediction_6=randrange(3)
prediction_7=randrange(3)
prediction_8=randrange(3)
prediction_9=randrange(3)
prediction_10=randrange(3)
prediction_11=randrange(3)
prediction_12=randrange(3)
prediction_13=randrange(3)
prediction_14=randrange(3)
prediction_15=randrange(3)
prediction_16=randrange(3)
prediction_17=randrange(3)
prediction_18=randrange(3)
prediction_19=randrange(3)
prediction_20=randrange(3)
#How to concatenate these data row by row in a matrix?
#How to add new column names and save the file?
I would like to concatenate them (old+synthetic data) and, after that, I would like to add 20 new columns named 'synthetic1', 'synthetic2', ..., 'synthetic20', to the existing column names and then save the resulting new dataset in a new text file.
I could do that easily with NumPy, but here, we have no numeric data and, therefore, I don't know how to do (or if it is possible to do) that. Is possible to do that with Pandas or another library?
Upvotes: 0
Views: 63
Reputation: 21749
Here's a way you can do:
import numpy as np
# set nrow and col, nrow should match the number of rows in existing df
n_row = 100
n_col = 20
f = pd.DataFrame(np.random.randint(100, size=(n_row, n_col)), columns=['synthetic' + str(x) for x in range(1,n_col+1)])
df = pd.concat([df, f])
Upvotes: 1