darkpool
darkpool

Reputation: 14641

Create series from one row of dataframe

I have the following dataframe:

Symbol, col1, col2, col3
abc,    435,  5465, 675
xyz,    565,  45,   567
mno,    675,  456,  789

I would like to select a specific row based on Symbol, with the result being a pandas series. For example selecting xyz should give me the following series:

Symbol, col1, col2, col3
xyz,    565,  45,   567

I have put logic rules in place such that Symbol should always be unique. But purely out of interest, what would happen if Symbol were not unique (would there hypothetically be a way to handle that?).

Upvotes: 1

Views: 5867

Answers (2)

numentar
numentar

Reputation: 1079

If the index value is not unique you get a dataframe instead of a Series:

import pandas as pd

data = [['Tokyo','London', 'New York', 'Manchester'],
['Japan','UK','US','UK'],
['Asia','Europe','North America','Europe']]

df = pd.DataFrame(data).transpose()
df.columns = ['City','Country','Continent']
df2 = df.set_index('City')

Selecting Tokyo gives a series:

print df2.loc['Tokyo']
print type(df2.loc['Tokyo']) 

Country      Japan
Continent     Asia
Name: Tokyo, dtype: object

<class 'pandas.core.series.Series'>

If the indexing is by, say, country:

df2 = df.set_index('City')

Then you get a dataframe:

print df3.loc['UK']
print type(df3.loc['UK'])
                City Continent
Country                       
UK            London    Europe
UK       Manchester    Europe
<class 'pandas.core.frame.DataFrame'>

So I'm not sure what do you mean about handling such a case without ditching some data.

Upvotes: 0

JoeCondron
JoeCondron

Reputation: 8906

Assuming Symbol is the DataFrame index, simply select the row you want using DataFrame.loc:

import pandas as pd
import numpy as np
df = pd.DataFrame(np.arange(9).reshape(3, 3), 
                  index=['abc', 'xyz', 'mno'], 
                  columns=['col1', 'col2', 'col3'])
df 
col1    col2    col3
abc 0   1   2
xyz 3   4   5
mno 6   7   8

In [21]: df.loc['xyz']
Out[21]:
col1    3
col2    4
col3    5


In [22]:

    isinstance(df.loc['xyz'], pd.Series)
Out[22]:
True

A single row or columns of a DataFrame is a Series. For example, to select the first column, simply call df['col1'].

If 'Symbol' is not the index, you can set it as the index or use the following boolean key method:

df[df.Symbol == 'xyz']

which is also equivalent to

df.loc[df.Symbol == 'xyz']

This second method is useful for assignment using boolean keys.

As for a non-unique index, calling df.loc will return a DataFrame corresponding to all rows containing that index:

In [23]:

df = pd.DataFrame(np.arange(12).reshape(4, 3), 
                  index=['abc', 'xyz', 'mno', 'xyz'], 
                  columns=['col1', 'col2', 'col3'])

In [24]:

df.loc['xyz']
Out[24]:
col1    col2    col3
xyz 3   4   5
xyz 9   10  11

Upvotes: 2

Related Questions