Reputation: 2253
Just a quick question, with pandas to_csv()
function I saved a pandas dataframe as a .csv file with this structure:
In:
df.to_csv(output_file, sep = '|')
Out:
|id|column2|column3
0|id_1|bla bla bla bla|more strings
1|id_2|bla bla bla bla|more strings
2|id_3|bla bla bla bla|more strings
....
n-1|id_n|bla bla bla bla| more strings
The problem with the previous file is the format, as you can see there is a bad column at the left side of the .csv file:
|id|
0|
1|
2|
....
n-1|
From the start, I tried to just drop that column which actually does not have a name by doing:
df.drop('',axis=1)
print list(df.columns.values)
['id', 'column2', 'column3]
However, it did not worked. How can I restructure the previous .csv file into something like this with to_csv()
function?:
id|column2|column3
id_1|bla bla bla bla|more strings
id_2|bla bla bla bla|more strings
id_3|bla bla bla bla|more strings
....
id_n|bla bla bla bla|more strings
update
With the answer of @piRSquared I tried to reformat the csv file as follows:
print list(df.columns.values)
return df.to_csv(output_file, sep='|', index_col=1)[['column1','column2', 'column3']]
#return df.to_csv(output_file, sep = '|')
Nevertheless, I got this:
['id', 'content', 'POS-tagged_content']
Traceback (most recent call last):
File "script.py", line 48, in <module>
preprocess_files(input_file, output_file)
File "script.py", line 39, in postag_pandas
return df.to_csv(output_file, sep='|', index_col=1)[['column1','column2', 'column3']]
TypeError: 'NoneType' object has no attribute '__getitem__'
Upvotes: 0
Views: 1574
Reputation: 294218
Try:
df.set_index('id')
Where df
is your dataframe
IIUC
What you've provided is a the text from a csv file and you are importing it into a pandas dataframe. It's confusing when you say:
How can I restructure the previous dataframe into something like this?:
I believe you've confused what is a dataframe and what is a csv.
A csv it text or a file with text that is to be parsed. Typically, this text is separated by commas. (Comma Separated Values)
a dataframe in a pandas/python context is a python object.
All that said, I believe you meant to ask:
How can I import a csv with this text such that I don't get the first column.
text = """|id|column2|column3
0|id_1|bla bla bla bla|more strings
1|id_2|bla bla bla bla|more strings
2|id_3|bla bla bla bla|more strings
n-1|id_n|bla bla bla bla| more strings"""
df = pd.read_csv(StringIO(text), sep='|', index_col=1)[['column2', 'column3']]
print df
Looks like:
column2 column3
id
id_1 bla bla bla bla more strings
id_2 bla bla bla bla more strings
id_3 bla bla bla bla more strings
id_n bla bla bla bla more strings
From here you can save to a csv like this:
df.to_csv('./mycsv.csv')
produces
id,column2,column3
id_1,bla bla bla bla,more strings
id_2,bla bla bla bla,more strings
id_3,bla bla bla bla,more strings
id_n,bla bla bla bla, more strings
Which is what you said you wanted.
Upvotes: 2
Reputation: 393963
It looks like you have a blank string for one of the columns, you can drop
it:
In [47]:
df = pd.DataFrame(np.random.randn(5,2), columns=['','asd'])
df
Out[47]:
asd
0 -0.911575 -0.142538
1 0.746839 -1.504157
2 0.611362 0.400219
3 -0.959443 1.494226
4 -0.346508 -1.471558
In [48]:
df.drop('',axis=1)
Out[48]:
asd
0 -0.142538
1 -1.504157
2 0.400219
3 1.494226
4 -1.471558
Upvotes: 1
Reputation: 50540
print df.to_string(index=False)
This will print your dataframe without the indexes.
>>> print df
id column2 column3
0 id_1 bla bla bla bla more strings
1 id_2 bla bla bla bla more strings
2 id_3 bla bla bla bla more strings
>>> print df.to_string(index=False)
id column2 column3
id_1 bla bla bla bla more strings
id_2 bla bla bla bla more strings
id_3 bla bla bla bla more strings
Upvotes: 2