Reputation: 2253

How to reformat a .csv file with in pandas dataframe?

Just a quick question, with pandas to_csv() function I saved a pandas dataframe as a .csv file with this structure:

In:

df.to_csv(output_file, sep = '|')

Out:

|id|column2|column3
0|id_1|bla bla bla bla|more strings
1|id_2|bla bla bla bla|more strings
2|id_3|bla bla bla bla|more strings
....
n-1|id_n|bla bla bla bla| more strings

The problem with the previous file is the format, as you can see there is a bad column at the left side of the .csv file:

|id|
0|
1|
2|
....
n-1|

From the start, I tried to just drop that column which actually does not have a name by doing:

df.drop('',axis=1)
print list(df.columns.values)
['id', 'column2', 'column3]

However, it did not worked. How can I restructure the previous .csv file into something like this with to_csv() function?:

id|column2|column3
id_1|bla bla bla bla|more strings
id_2|bla bla bla bla|more strings
id_3|bla bla bla bla|more strings
....
id_n|bla bla bla bla|more strings

update

With the answer of @piRSquared I tried to reformat the csv file as follows:

print list(df.columns.values)
return df.to_csv(output_file, sep='|', index_col=1)[['column1','column2', 'column3']]
#return df.to_csv(output_file, sep = '|')

Nevertheless, I got this:

['id', 'content', 'POS-tagged_content']
Traceback (most recent call last):
  File "script.py", line 48, in <module>
    preprocess_files(input_file, output_file)
  File "script.py", line 39, in postag_pandas
    return df.to_csv(output_file, sep='|', index_col=1)[['column1','column2', 'column3']]
TypeError: 'NoneType' object has no attribute '__getitem__'

Upvotes: 0

Answers (3)

piRSquared

Reputation: 294218

Try:

df.set_index('id')

Where df is your dataframe

Edit

IIUC

What you've provided is a the text from a csv file and you are importing it into a pandas dataframe. It's confusing when you say:

How can I restructure the previous dataframe into something like this?:

I believe you've confused what is a dataframe and what is a csv.

A csv it text or a file with text that is to be parsed. Typically, this text is separated by commas. (Comma Separated Values)

a dataframe in a pandas/python context is a python object.

All that said, I believe you meant to ask:

How can I import a csv with this text such that I don't get the first column.

text = """|id|column2|column3
0|id_1|bla bla bla bla|more strings
1|id_2|bla bla bla bla|more strings
2|id_3|bla bla bla bla|more strings
n-1|id_n|bla bla bla bla| more strings"""

df = pd.read_csv(StringIO(text), sep='|', index_col=1)[['column2', 'column3']]

print df

Looks like:

              column2        column3
id                                  
id_1  bla bla bla bla   more strings
id_2  bla bla bla bla   more strings
id_3  bla bla bla bla   more strings
id_n  bla bla bla bla   more strings

From here you can save to a csv like this:

df.to_csv('./mycsv.csv')

produces

id,column2,column3
id_1,bla bla bla bla,more strings
id_2,bla bla bla bla,more strings
id_3,bla bla bla bla,more strings
id_n,bla bla bla bla, more strings

Which is what you said you wanted.

Upvotes: 2

EdChum

Reputation: 393963

It looks like you have a blank string for one of the columns, you can drop it:

In [47]:
df = pd.DataFrame(np.random.randn(5,2), columns=['','asd'])
df

Out[47]:
                  asd
0 -0.911575 -0.142538
1  0.746839 -1.504157
2  0.611362  0.400219
3 -0.959443  1.494226
4 -0.346508 -1.471558

In [48]:
df.drop('',axis=1)

Out[48]:
        asd
0 -0.142538
1 -1.504157
2  0.400219
3  1.494226
4 -1.471558

Upvotes: 1

Andy

Reputation: 50540

print df.to_string(index=False)

This will print your dataframe without the indexes.

>>> print df
     id          column2       column3
0  id_1  bla bla bla bla  more strings
1  id_2  bla bla bla bla  more strings
2  id_3  bla bla bla bla  more strings

>>> print df.to_string(index=False)
   id          column2       column3
 id_1  bla bla bla bla  more strings
 id_2  bla bla bla bla  more strings
 id_3  bla bla bla bla  more strings

Upvotes: 2

How to reformat a .csv file with in pandas dataframe?

Answers (3)

Edit

Related Questions