How to add a duplicate csv column using pandas

Question

I have a CSV that has just one column with domains, similar to this:

google.com
yahoo.com
cnn.com
toast.net

I want to add a duplicate column and add the headers domain and matches so my csv will look like:

domain       matching
google.com   google.com
yahoo.com    yahoo.com
cnn.com      cnn.com
toast.net    toast.net

I tried the following in my python script using pandas:

df = read_csv('temp.csv')
df.columns = ['domain', 'matching']
df['matching'] = df['domain']
df.to_csv('temp.csv', index=False)

but I am getting the following error:

"ValueError: Length mismatch: Expected axis has 1 elements, new values have 2 elements".

I assume I need to add another column first? Can I do this using pandas?

jezrael · Accepted Answer

You can add parameter name to read_csv:

import pandas as pd
import io

temp=u"""google.com
yahoo.com
cnn.com
toast.net"""

#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), names=['domain'])
#real data
#df = pd.read_csv('temp.csv', names=['domain'])

print (df)
       domain
0  google.com
1   yahoo.com
2     cnn.com
3   toast.net

df['matching'] = df['domain']

print (df.to_csv(index=False))
#real data
#df.to_csv('temp.csv', index=False)
domain,matching
google.com,google.com
yahoo.com,yahoo.com
cnn.com,cnn.com
toast.net,toast.net

You can modify your solution, but you lost first row, because it is read as column name:

df = pd.read_csv(io.StringIO(temp))
print (df)
#real data
#df = pd.read_csv('temp.csv')
  google.com
0  yahoo.com
1    cnn.com
2  toast.net

df.columns = ['domain']
df['matching'] = df['domain']

df.to_csv('temp.csv', index=False)

But you can add parameter header=None to read_csv and remove second value from df.columns = ['domain', 'matching'], because first DataFrame has only one column:

import pandas as pd
import io

temp=u"""google.com
yahoo.com
cnn.com
toast.net"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), header=None)
print (df)
#real data
#df = pd.read_csv('temp.csv', header=None)
            0
0  google.com
1   yahoo.com
2     cnn.com
3   toast.net

df.columns = ['domain']
df['matching'] = df['domain']

df.to_csv('temp.csv', index=False)

How to add a duplicate csv column using pandas

Answers (1)

Related Questions