Reputation: 14641
I have the following dataframe:
Symbol, col1, col2, col3
abc, 435, 5465, 675
xyz, 565, 45, 567
mno, 675, 456, 789
I would like to select a specific row based on Symbol, with the result being a pandas series. For example selecting xyz should give me the following series:
Symbol, col1, col2, col3
xyz, 565, 45, 567
I have put logic rules in place such that Symbol should always be unique. But purely out of interest, what would happen if Symbol were not unique (would there hypothetically be a way to handle that?).
Upvotes: 1
Views: 5867
Reputation: 1079
If the index value is not unique you get a dataframe instead of a Series:
import pandas as pd
data = [['Tokyo','London', 'New York', 'Manchester'],
['Japan','UK','US','UK'],
['Asia','Europe','North America','Europe']]
df = pd.DataFrame(data).transpose()
df.columns = ['City','Country','Continent']
df2 = df.set_index('City')
Selecting Tokyo gives a series:
print df2.loc['Tokyo']
print type(df2.loc['Tokyo'])
Country Japan
Continent Asia
Name: Tokyo, dtype: object
<class 'pandas.core.series.Series'>
If the indexing is by, say, country:
df2 = df.set_index('City')
Then you get a dataframe:
print df3.loc['UK']
print type(df3.loc['UK'])
City Continent
Country
UK London Europe
UK Manchester Europe
<class 'pandas.core.frame.DataFrame'>
So I'm not sure what do you mean about handling such a case without ditching some data.
Upvotes: 0
Reputation: 8906
Assuming Symbol is the DataFrame index, simply select the row you want using DataFrame.loc:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.arange(9).reshape(3, 3),
index=['abc', 'xyz', 'mno'],
columns=['col1', 'col2', 'col3'])
df
col1 col2 col3
abc 0 1 2
xyz 3 4 5
mno 6 7 8
In [21]: df.loc['xyz']
Out[21]:
col1 3
col2 4
col3 5
In [22]:
isinstance(df.loc['xyz'], pd.Series)
Out[22]:
True
A single row or columns of a DataFrame is a Series. For example, to select the first column, simply call df['col1'].
If 'Symbol' is not the index, you can set it as the index or use the following boolean key method:
df[df.Symbol == 'xyz']
which is also equivalent to
df.loc[df.Symbol == 'xyz']
This second method is useful for assignment using boolean keys.
As for a non-unique index, calling df.loc will return a DataFrame corresponding to all rows containing that index:
In [23]:
df = pd.DataFrame(np.arange(12).reshape(4, 3),
index=['abc', 'xyz', 'mno', 'xyz'],
columns=['col1', 'col2', 'col3'])
In [24]:
df.loc['xyz']
Out[24]:
col1 col2 col3
xyz 3 4 5
xyz 9 10 11
Upvotes: 2