Reputation: 51
I have the current code using pandas:
file1_csv = 'fileX.csv'
data = pd.read_csv(file1_csv, header=None, usecols=[0,43])
print (data)
The result is this:
0 43
0 57669557 2020-02-15
1 57779240 2017-02-15
2 96951148 2018-07-24
What I need is to put this result into a new csv file and have something like this:
col1, col2
57669557,2020-02-15
57779240,2017-02-15
96951148,2018-07-24
My code is like this:
final = pd.DataFrame(data, columns=['col1','col2'])
final.to_csv('finalFile.csv', index=False)
But the output is wrong and generates the next:
col1,col2
,
,
,
Upvotes: 3
Views: 50
Reputation: 35626
When using the DataFrame
constructor with an already indexed structure (like a another DataFrame
). The columns
argument, selects values from the existing index, it does not overwrite the index names.
We need to do something like:
final = pd.DataFrame(data)
final.columns = ['col1', 'col2'] # Overwrite Column Names
final.to_csv('finalFile.csv', index=False)
Or get a non-indexed structure like an array
(to_numpy
):
# Break existing index alignment
final = pd.DataFrame(data.to_numpy(), columns=['col1','col2'])
final.to_csv('finalFile.csv', index=False)
*Or any of the many other ways to rename
or overwrite (set_axis
) the existing columns
These approaches produce the expected finalFile.csv
:
col1,col2
57669557,2020-02-15
57779240,2017-02-15
96951148,2018-07-24
Take a look at this toy example showing columns
selecting values from the existing DataFrame:
import pandas as pd
data = pd.DataFrame({
0: [57669557, 57779240, 96951148],
43: ['2020-02-15', '2017-02-15', '2018-07-24']
})
print(data)
final = pd.DataFrame(data, columns=[43])
print(final)
Program output:
# data
0 43
0 57669557 2020-02-15
1 57779240 2017-02-15
2 96951148 2018-07-24
# final (Only column 43 was selected)
43
0 2020-02-15
1 2017-02-15
2 2018-07-24
Upvotes: 1