Reputation: 237
I have a CSV
that has just one column with domains
, similar to this:
google.com
yahoo.com
cnn.com
toast.net
I want to add a duplicate column and add the headers domain
and matches
so my csv
will look like:
domain matching
google.com google.com
yahoo.com yahoo.com
cnn.com cnn.com
toast.net toast.net
I tried the following in my python script using pandas:
df = read_csv('temp.csv')
df.columns = ['domain', 'matching']
df['matching'] = df['domain']
df.to_csv('temp.csv', index=False)
but I am getting the following error:
"ValueError: Length mismatch: Expected axis has 1 elements, new values have 2 elements".
I assume I need to add another column first? Can I do this using pandas?
Upvotes: 1
Views: 1249
Reputation: 863681
You can add parameter name
to read_csv
:
import pandas as pd
import io
temp=u"""google.com
yahoo.com
cnn.com
toast.net"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), names=['domain'])
#real data
#df = pd.read_csv('temp.csv', names=['domain'])
print (df)
domain
0 google.com
1 yahoo.com
2 cnn.com
3 toast.net
df['matching'] = df['domain']
print (df.to_csv(index=False))
#real data
#df.to_csv('temp.csv', index=False)
domain,matching
google.com,google.com
yahoo.com,yahoo.com
cnn.com,cnn.com
toast.net,toast.net
You can modify your solution, but you lost first row, because it is read as column name:
df = pd.read_csv(io.StringIO(temp))
print (df)
#real data
#df = pd.read_csv('temp.csv')
google.com
0 yahoo.com
1 cnn.com
2 toast.net
df.columns = ['domain']
df['matching'] = df['domain']
df.to_csv('temp.csv', index=False)
But you can add parameter header=None
to read_csv
and remove second value from df.columns = ['domain', 'matching']
, because first DataFrame
has only one column:
import pandas as pd
import io
temp=u"""google.com
yahoo.com
cnn.com
toast.net"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), header=None)
print (df)
#real data
#df = pd.read_csv('temp.csv', header=None)
0
0 google.com
1 yahoo.com
2 cnn.com
3 toast.net
df.columns = ['domain']
df['matching'] = df['domain']
df.to_csv('temp.csv', index=False)
Upvotes: 1