Reputation: 1142
I am trying to export a random subset of a CSV file to a new CSV file using the following code:
with open("DepressionEffexor.csv", "r") as effexor:
lines = [line for line in effexor]
random_choice = random.sample(lines, 229)
with open("effexorSample.csv", "w") as sample:
sample.write("\n".join(random_choice))
But the problem is that the output CSV file is very messy. for example, some part of a data in a filed was printed in the next line. How can I solve the problem? In addition, I want to know how can I use pandas for this problem rather than CSV. Thanks !
Upvotes: 1
Views: 669
Reputation: 738
Using pandas, this translates to:
import pandas as pd
#Read the csv file and store it as a dataframe
df = pd.read_csv('DepressionEffexor.csv')
#Shuffle the dataframe and store it
df_shuffled = df.iloc[np.random.permutation(len(df))]
#You can reset the index with the following
df_shuffled.reset_index(drop=True)
You can splice the dataframe later to choose what you want.
Upvotes: 0
Reputation: 641
Assuming you had a CSV read into pandas:
df = pandas.read_csv("csvfile.csv")
sample = df.sample(n)
sample.to_csv("sample.csv")
You could make it even shorter:
df.sample(n).to_csv("csvfile.csv")
The Pandas IO docs have a great deal more information and options available, as does the dataframe.sample
method.
Upvotes: 4