HBS
HBS

Reputation: 55

Using Python to randomize csv file

I have a csv file with 2 columns:

1 A
2 B
3 C
4 D

My aim is to use Python to open the file, read it, randomize the order of the two lists (i.e. have 1 be with the same line as C, 2 with D etc.), and then save the new randomized lists in a different csv file.

I read some more stuff about writer, but am unsure how to use these functions yet.

The only problem is that I need to keep the columns headers intact, they can't be randomized. The code was as follows:

import csv
import random

with open ("my_file") as f:
    l = list(csv.reader(f))

random.shuffle(l)

with open("random.csv", "W") as f:
    csv.writer(f).writerows(f)

Upvotes: 2

Views: 5487

Answers (4)

Pere
Pere

Reputation: 2033

Have a look at the source code of csvshuf:

reader = csv.reader(args.infile, delimiter=args.delimiter, quotechar=args.quotechar)

"""Get the first row and use it as column headers"""
headers = next(reader)

"""Create a matrix of lists of columns"""
table = []
for c in range(len(headers)):
    table.append([])
for row in reader:
    for c in range(len(headers)):
        table[c].append(row[c])

cols = args.columns

for c in cols:
    args.shuffle(table[c - 1])

"""Transpose the matrix"""
table = zip(*table)

writer = csv.writer(sys.stdout, delimiter=args.output_delimiter)
writer.writerow(headers)
for row in table:
    writer.writerow(row)

Upvotes: 0

wwii
wwii

Reputation: 23763

Maybe not use the csv module. How about

Create two empty lists, one to hold the numbers and one to hold the letters.

Open the file,

For each line on the file

Split the line

Add the number to the numbers list

Add the letter to the letters list


Shuffle the numbers list

Take one item from each list, in sequence, and write them to a file

Repeat

The built-in function zip should help with that last bit.

Upvotes: 0

JuniorCompressor
JuniorCompressor

Reputation: 20025

You can read the rows as list, extract the two columns, then shuffle each one, then zip the columns together and finally write the result to a new csv file:

import csv
import random

with open("input.csv") as f:
    r = csv.reader(f)
    header, l = next(r), list(r)

a = [x[0] for x in l]
random.shuffle(a)

b = [x[1] for x in l]
random.shuffle(b)

with open("random.csv", "wb") as f:
    csv.writer(f).writerows([header] + zip(a, b))

Upvotes: 3

Haleemur Ali
Haleemur Ali

Reputation: 28303

HBS, the problem with your code is that it attempts to shuffle the row order, and not the columns individually.

You can read each column into separate lists, and then apply the shuffle, then combine the two lists together to form a list of rows before writing them to the output file.

To maintain the headers, after you have read the input file, pop the first element off the resulting list and then recombine after shuffling.

Here's the code to illustrate the steps:

import random
import csv

# read the data into lists
with open('input.csv', 'r') as myfile:
    csvreader = csv.reader(myfile, delimiter=' ')
    list1 = []
    list2 = []
    for row in csvreader:
        a, b = row
        list1.append(a)
        list2.append(b)

# pop the first element (headers)
title1, title2 = list1.pop(0), list2.pop(0)

# shuffle the list
random.shuffle(list1)
random.shuffle(list2)

# add the titles back: 
list1 = [title1] + list1
list2 = [title2] + list2

# write rows to output file
with open('output.csv', 'w') as oput:
    output_rows = list(zip(list1, list2))
    csvwriter = csv.writer(oput, delimiter=' ')
    csvwriter.writerows(output_rows)

Upvotes: 0

Related Questions