neversaint
neversaint

Reputation: 64004

How to convert a single column Pandas DataFrame into Series

I have the following data frame:

import pandas as pd
d = {'gene' : ['foo','bar'],'score' : [4., 3.,]}
df = pd.DataFrame(d)
df.set_index('gene',inplace=True)

Which make:

In [56]: df
Out[56]:
      score
gene
foo       4
bar       3
In [58]: type(df)
Out[58]: pandas.core.frame.DataFrame

What I want to do is to turn it into a Series. I expect it to to return:

gene
foo       4
bar       3
#pandas.core.series.Series

I tried this but it doesn't work:

In [64]: type(df.iloc[0:,])
Out[64]: pandas.core.frame.DataFrame

In [65]: df.iloc[0:,]
Out[65]:
      score
gene
foo       4
bar       3

What's the right way to do it?

Upvotes: 10

Views: 16996

Answers (4)

mins
mins

Reputation: 7504

With 2.2.1, these accesses return column score as a Series

  • df.score --> access single column as attribute
  • df['score'] --> access column(s) by label(s)
  • df.get('score') --> get column items(s)

Plus regular indexing of a column:

  • df.iloc[:, 0] --> access items by positions
  • df.loc[:, 'score'] --> access items by labels

df['score'] is the most logical way, and if your column name is a valid identifier, it can be shortened into df.score (though this shortcut is fragile).

This also work for MultiIndex, by replacing the label by a tuple of labels.


import pandas as pd
d = {'gene' : ['foo','bar'],'score' : [4., 3.,]}
df = pd.DataFrame(d)
df.set_index('gene',inplace=True)

print(df.score)
print(df['score'])
print(df.iloc[:,0])
print(df.loc[:,'score'])
print(df.get('score'))

All result in:

gene
foo    4.0
bar    3.0
Name: score, dtype: float64

Upvotes: 0

Aman khan Roohani
Aman khan Roohani

Reputation: 159

Swapping the indices would solve the problem easily:

In [64]: type(df.iloc[0:,])
Out[64]: pandas.core.frame.DataFrame

In [65]: df.iloc[[:,0] // Swaped the indices
Out[65]:
        score
gene
foo       4
bar       3

Upvotes: 1

Alexander
Alexander

Reputation: 109546

s = df.squeeze()
>>> s
gene
foo    4
bar    3
Name: score, dtype: float64

To get it back to a dataframe:

>>> s.to_frame()
      score
gene       
foo       4
bar       3

Upvotes: 29

honza_p
honza_p

Reputation: 2093

Try swapping the indices in the brackets:

df.iloc[:,0]

This should work.

Upvotes: 11

Related Questions