Reputation: 87
I'm working with large set of csv data and I want to put several columns in different places into one column separated by semi-colon(;).
So what I have now is..
a b c d
1 2 3 4
1 2 3 4
1 2 3 4
I want to change this like..this, So all my data is only in column d.
a b c d
a=1;b=2;c=3;d=4;
a=1;b=2;c=3;d=4;
a=1;b=2;c=3;d=4;
I know how to delete those empty column a,b and c but I just can't figure out a way to merge the data from column a,b,c into column d. Thanks in advance.
The code that I have so far is..
# Parsing the custom formatted data with csv module.
# reads the custom format input and spits out the output in VCF format.
import csv
# input and output
with open('1-0002', 'rb') as csvin, open('converted1','wb') as csvout:
# reading and writing are all tab delimited
reader = csv.reader(csvin, delimiter = '\t')
writer = csv.writer(csvout, delimiter = '\t')
# add headings before the for loop to prevent the heading being affected by column manipulation.
writer.writerow(["#CHROM","POS","ID","REF","ALT","QUAL","FILTER","INFO"])
for row in reader:
# deleting unnecessary columns, 'del' operator must be in ascending order or else it will give range error
# manually deleting columns since the input data is in custom format.
del row[11]
del row[10]
del row[9]
del row[8]
del row[7]
del row[6]
del row[5]
del row[1]
del row[0]
# inserting 1 and . in specific columns
row.insert(0,'1')
row.insert(2,'.')
row.insert(5,'.')
row.insert(7,'') # inserting empty column for INFO headings.
# change 'YES' to 'PASS' , leaving HETERO as it is.
if row[6] == 'YES':
row[6] = 'PASS'
writer.writerow(row)
So from this code above, I want to put the data from several different columns into INFO column.
Upvotes: 1
Views: 3766
Reputation: 7179
Try pandas
import pandas as pd
df = pd.read_csv('1-0002.csv')
df['d_merged'] = df.apply(lambda row: 'a={0};b={1};c={2};d={3};'.format(row['a'],row['b'],row['c'],row['d']), axis=1)
This gives:
>>> df
a b c d d_merged
0 1 2 3 4 a=1;b=2;c=3;d=4;
1 1 2 3 4 a=1;b=2;c=3;d=4;
2 1 2 3 4 a=1;b=2;c=3;d=4;
Now delete the columns you don't want:
df = df.drop(['a','b','c','d'], axis=1)
>>> df
d_merged
0 a=1;b=2;c=3;d=4;
1 a=1;b=2;c=3;d=4;
2 a=1;b=2;c=3;d=4;
Now rename d_merged
if you wish:
df = df.rename(columns={'d_merged':'d'})
>>> df
d
0 a=1;b=2;c=3;d=4;
1 a=1;b=2;c=3;d=4;
2 a=1;b=2;c=3;d=4;
(Alternatively, combine the above two steps as:
df['d'] = df.apply(lambda row: 'a={0};b={1};c={2};d={3};'.format(row['a'],row['b'],row['c'],row['d']), axis=1)
)
Then write to CSV:
df.to_csv('csvout.csv', index=False)
Upvotes: 0
Reputation: 1355
Simple answer: don't bother deleting the row, but make a NEW row for insertion that only picks what you want.
It will look like this:
# leave row alone, don't bother deleting columns in it.
new_row = ["a=%s;b=%s;c=%s;d=%s"% (row[12], row[13], row[14])]
# new_row has only one column, with a string constructed of what you need.
writer.writerow(new_row)
And voila, that should do it for you. You can also copy any other columns you need to new_row, and append()
whatever else you might desire.
Upvotes: 3
Reputation: 2660
just read the file with a reader and write the data you read into another file. this example is assuming you want to still keep empty columns a, b and c
r = csv.readedr('filename.csv', 'r')
with open('outfile.csv') as outfile:
w = csv.writer(outfile, 'w')
for row in r:
# joins the current row into string seperated by spaces
outdata = row.join(' ')
# leaves first three columns empty, writes old content onto fourth column
data_to_write = ('', '', '', outdata)
w.writerow(data_to_write)
Upvotes: 0