Reputation: 3544
I'm using the NumPy
python library to run large-scale edits on a .csv
file. I'm using this python code:
import numpy as np
def main():
try:
e,a,ad,c,s,z,ca,fn,ln,p,p2,g,ssn,cn,com,dob,doh,em = np.loadtxt('c:\wamp\www\_quac\carryover_data\SI\Employees.csv',delimiter=',',unpack=True,dtype='str')
x=0
dob = dob.split('/')
for digit in dob:
if len(digit) == 1:
digit = str('0'+digit)
dob = str(dob[2]+'-'+dob[0]+'-'+dob[1])
doh = doh.split('/')
for digit in doh:
if len(digit) == 1:
digit = str('0'+digit)
doh = str(doh[2]+'-'+doh[0]+'-'+doh[1])
for eID in e:
saveLine=eID+','+a[x]+','+ad[x]+','+c[x]+','+s[x]+','+z[x]+','+ca[x]+','+fn[x]+','+ln[x]+','+p[x]+','+p2[x]+','+g[x]+','+ssn[x]+','+cn[x]+','+com[x]+','+dob[x]+','+doh[x]+','+em[x]+'\n'
saveFile = open('fixedEmployees.csv','a')
saveFile.write(saveLine)
saveFile.close()
x+=1
except Exception, e:
print str(e)
main()
dob
and doh
contain a string, e.g. 4/26/2012
and I'm trying to convert these to mysql
friendly DATE
forms, e.g. 2012-04-26
. The error that is printed when I run this script is
cannot set an array element with a sequence
It does not specify a line and so I don't know what this really means. I'm pretty new to python; I've checked other questions with this same error but I can't make sense of their code. Any help is very appreciated.
Upvotes: 0
Views: 623
Reputation: 3544
Ok, to solve it I had to do a couple things. After removing the try
-except
commands, I found out that the error was on line 5, the line with e,a,ad,c,s
etc. I couldn't eliminate the problem until I simply copied the 2 columns I wanted to focus on only and made a new program for dealing with those.
Then I had to create a .txt
instead of a .csv
because Excel auto-formats the dates and literally changes the values before I can even touch them. There is no way around that, I've learned. You can't turn the date-auto-format off. A serious problem with excel. So here's my solution for this NumPy
script (it changes the first column and keeps the second the same):
import numpy as np
def main():
dob,doh=np.loadtxt('temp.csv',
delimiter=',',
unpack=True,
dtype='str')
x=0
for eachDate in dob:
if any(c.isalpha() for c in eachDate):
newDate=eachDate
elif (eachDate == ''):
newDate=''
else:
sp = eachDate.split('/')
y=0
ndArray = ['','','']
for eachDig in sp:
if len(eachDig) == 1:
eachDig = str('0'+eachDig)
if y == 0:
ndArray[0] = eachDig
elif y == 1:
ndArray[1] = eachDig
elif y == 2:
ndArray[2] = eachDig
newDate=str(ndArray[2]+'-'+ndArray[0]+'-'+ndArray[1])
y=0
y+=1
print eachDate+'--->'+newDate
"""creates a .txt file with the edited dates"""
saveLine=str(newDate+','+doh[x]+'\n')
saveFile=open('__newTemp.txt','a')
saveFile.write(saveLine)
saveFile.close()
x+=1
main()
I then used Data->Import from text with "TEXT" format option in Excel to get the column into my .csv
. I realize this is probably bulky and noobish but it got the job done :3
Upvotes: 0
Reputation: 602
Try using zfill
to reformat the date string so you can have a '0' before your '4'. (zfill
pads a string on the left with zeros to fill the width.)
doh = '4/26/2012'
doh = doh.split('/')
for i, s in enumerate(doh):
doh[i] = s.zfill(2)
doh = doh[2]+'-'+doh[0]+'-'+doh[1]
# result: '2012-04-26'
As for the cannot set an array element with a sequence
it would be helpful to know
where that is occurring. I'm guessing there is something wrong with structure of the array.
Upvotes: 1