Serena
Serena

Reputation: 75

How to check if positional indexes in a list exists in section of corresponding DataFrame

I have a DataFrame like this:

Date          A      B      C
2021-08-20    1      2      3
2021-08-21    2      3      4
2021-08-22    3      4      5
2021-08-23    4      5      6
2021-08-24    7      8      9
2021-08-25    10     11     12
2021-08-26    11     12     13
2021-08-28    12     13     14

My "target" section is dates from 2021-08-21 to 2021-08-24.

Now I have a list of positional indices:

A = [0, 1, 3, 4, 6, 7]

What I'm trying to do is create a new list of indices that correspond to the indices only in my target section, and then find the total number of elements in the new list.

Target answer:

new_list = [1, 3, 4]
print(len(new_list))
3

I've tried this so far:

new_list = []
df_range = df.loc['2021-08-21':'2021-08-24']

for data_idx in A:
    if data_idx == df_range.iloc[data_idx]:
        new_list.append(data_idx)
print(len(new_list))

But I get IndexErrors (single positional indexer is out-of-bounds) or Key errors (for a similar attempt). I believe what's erroring is when the program tries to locate the indexes outside of this range?

Thank you in advance and sorry if anything is confusing. I know there should be an easy way to do this but I just can't figure it out.

Upvotes: 1

Views: 51

Answers (3)

Scott Boston
Scott Boston

Reputation: 153510

If 'Date' is in the index of the dataframe and the datatype is datetime index, then we can use pd.Index.get_indexer and use set operations to find intersection.

#Copy dataframe from question above
df = pd.read_clipboard(index_col=[0])

df.index = pd.to_datetime(df.index)
idx = df.index.get_indexer(pd.date_range('2021-08-21', '2021-08-24', freq='D'))

A = [0, 1, 3, 4, 6, 7]
overlap = set(A) & set(idx)

print(f'{overlap=} and {len(overlap)=}')

Output:

overlap={1, 3, 4} and len(overlap)=3

Upvotes: 1

Andrej Kesely
Andrej Kesely

Reputation: 195553

IIUC:

A = [0, 1, 3, 4, 6, 7]

df["tmp"] = range(len(df))
x = df.loc["2021-08-21":"2021-08-24"]
print(x.loc[x["tmp"].isin(A), "tmp"].to_list())

Prints:

[1, 3, 4]

Upvotes: 1

Jorge Alvarez
Jorge Alvarez

Reputation: 104

If I understood the question, you're wanting to have a list with corresponding indexes to your df_range? If so these two approaches are commonly used for that

new_list = []
df_range = df.loc['2021-08-21':'2021-08-24']

for i, v in enumerate(df_range):
    new_list.append(i)
    
    
for i in range(len(df_range)):
    new_list.append(i)

Upvotes: 0

Related Questions