Mary
Mary

Reputation: 1142

Exporting a random sample from CSV file to a new CSV file - output is messy

I am trying to export a random subset of a CSV file to a new CSV file using the following code:

with open("DepressionEffexor.csv", "r") as effexor:
    lines = [line for line in effexor]
    random_choice = random.sample(lines, 229)

with open("effexorSample.csv", "w") as sample:
   sample.write("\n".join(random_choice))

But the problem is that the output CSV file is very messy. for example, some part of a data in a filed was printed in the next line. How can I solve the problem? In addition, I want to know how can I use pandas for this problem rather than CSV. Thanks !

Upvotes: 1

Views: 669

Answers (2)

Harshavardhan Ramanna
Harshavardhan Ramanna

Reputation: 738

Using pandas, this translates to:

import pandas as pd

#Read the csv file and store it as a dataframe
df = pd.read_csv('DepressionEffexor.csv')

#Shuffle the dataframe and store it
df_shuffled = df.iloc[np.random.permutation(len(df))]

#You can reset the index with the following
df_shuffled.reset_index(drop=True)

You can splice the dataframe later to choose what you want.

Upvotes: 0

binaryaaron
binaryaaron

Reputation: 641

Assuming you had a CSV read into pandas:

df = pandas.read_csv("csvfile.csv")
sample = df.sample(n)
sample.to_csv("sample.csv")

You could make it even shorter:

df.sample(n).to_csv("csvfile.csv")

The Pandas IO docs have a great deal more information and options available, as does the dataframe.sample method.

Upvotes: 4

Related Questions