Reputation: 4245
Below I have a code where a read a csv file and take a random sample of 700
from the file.
I need to do this on multiple files, but if I iterate over the files, the sample (as it is random) will be different for each file, wheras I want to keep it the same once it's randomly generated.
df = pd.read_csv(file.csv, delim_whitespace=True)
df_s = df.sample(n=700)
My ideas are to take the row number and then pass it to the next file, however this does not seem to be very elegant.
Do you know any good solutions to this issue?
CLARIFICATION
The file lengths are different, but there is a minimum file length: 750.
desired outcome EXAMPLE
df1 = pd.read_csv(file1.csv, delim_whitespace=True)
df_s1 = df1.sample(n=700) # choose random sample
df2 = pd.read_csv(file2.csv, delim_whitespace=True)
df_s2 = df2.sample(n=700) # use same random sample as above
Upvotes: 6
Views: 7769
Reputation: 1824
Another option would be to set the np.random.seed(123)
.
This has the advantage that it sets the random seed for all pandas
functions at once.
A more detail answer can be found here
Upvotes: 0