Imad
Imad

Reputation: 2741

pandas .loc returns inconsistent types

Say I have 2 dataframes

df1= pd.DataFrame(["2020-12-31","2021-01-01"],columns={"date"},index=['23845940781720275',"23845940781720275"])

and

df2 = pd.DataFrame(["2020-12-31"],columns={"date"},index=["23845940781720275"])

I want to get a way to enumerate the items in the "date" column for both cases:

When I try the following solutions I get inconsistent results

> type(df1.loc["23845940781720275"]["date"])

<class 'pandas.core.series.Series'>

> df1.loc["23845940781720275"]["date"]

23845940781720275    2020-12-31
23845940781720275    2021-01-01
Name: date, dtype: object

> type(df2.loc["23845940781720275"]["date"])

<class 'str'>

> df2.loc["23845940781720275"]["date"]

'2020-12-31'

I found some posts saying to use the df.loc[x][['column']] to always get a DataFrame, but then when I use it, I get the same level of inconsistency

> type(df1.loc["23845940781720275"][["date"]])

<class 'pandas.core.frame.DataFrame'>

> type(df2.loc["23845940781720275"][["date"]])

<class 'pandas.core.series.Series'>

My IRL use case is made easier and more readable using pandas, any fix?

Upvotes: 2

Views: 345

Answers (2)

Nk03
Nk03

Reputation: 14949

I guess it's because the second dataframe contains only 1 row and 1 column.

In the second case type(df2.loc["23845940781720275"][["date"]]) -

you converted str to series. It's not a dataframe as it still contains only one column pointing to single series.

If you want to remove the inconsistency then use -

type(df2.loc[["23845940781720275"]][["date"]]) # pandas.core.frame.DataFrame

To fetch a list of dates for an index use -

df2.loc[["23845940781720275"]][["date"]]["date"].values.tolist()

Upvotes: 2

Imad
Imad

Reputation: 2741

Here's a (dirty) fix to extract the dates from the dataframes:

[date_str.strip() for date_str in df1.loc[['23845940781720275']][['date']].to_string().strip().split(f"\n{'23845940781720275'}")[1:]]

returns : ['2020-12-31', '2021-01-01']

[date_str.strip() for date_str in df2.loc[['23845940781720275']][['date']].to_string().strip().split(f"\n{'23845940781720275'}")[1:]]

returns : ['2020-12-31']

Upvotes: 0

Related Questions