maclian
maclian

Reputation: 439

Using for loop to grab values of one column based on the value of another column

I am trying to grab all the values of one column based on the value of another. I found some helpful stackoverflow questions already that are related to mine, but the solution in those don't seem to work on a variable range. Do I need to do something different for a variable?

I am trying to only grab the values of column 'open', from the dataset where the value of 'month' equals the month variable in the loop.

To be clear, the expected output is only the 'open' values.

for year in dfClose['year'].unique():
        tempYearDF = dfClose[dfClose['year'] == year]
        for month in range(1,13):
            tempOpenDF = tempYearDF.loc[tempYearDF['month'] == month, 'open']

I plan to do more manipulating to the tempOpenDF variable after assigning the data, but I first need to verify it is populating.

Sample data

dfClose

    open      year  month   day    date
0   30.490000   2010    1   4   2010-01-04
1   30.657143   2010    1   5   2010-01-05
2   30.625713   2010    1   6   2010-01-06
3   30.250000   2010    1   7   2010-01-07
4   30.042856   2010    1   8   2010-01-08
.
.
2551    297.260010  2020    2   24  2020-02-24
2552    300.950012  2020    2   25  2020-02-25
2553    286.529999  2020    2   26  2020-02-26
2554    281.100006  2020    2   27  2020-02-27
2555    257.260010  2020    2   28  2020-02-28

Output

tempOpenDF
Series([], Name: open, dtype: float64)

Data types

tempYearDF.dtypes

open     float64
year       int64
month      int64
day        int64
date      object
dtype: object

All the data for "year" is correctly separating, just having trouble grabbing month data now.

tempYearDF

    open    year    month   day date
2516    296.239990  2020    1   2   2020-01-02
2517    297.149994  2020    1   3   2020-01-03
2518    293.790009  2020    1   6   2020-01-06
2519    299.839996  2020    1   7   2020-01-07
2520    297.160004  2020    1   8   2020-01-08
2521    307.239990  2020    1   9   2020-01-09
2522    310.600006  2020    1   10  2020-01-10
2523    311.640015  2020    1   13  2020-01-13
2524    316.700012  2020    1   14  2020-01-14
2525    311.850006  2020    1   15  2020-01-15
2526    313.589996  2020    1   16  2020-01-16
2527    316.269989  2020    1   17  2020-01-17
2528    317.190002  2020    1   21  2020-01-21
2529    318.579987  2020    1   22  2020-01-22
2530    317.920013  2020    1   23  2020-01-23
2531    320.250000  2020    1   24  2020-01-24
2532    310.059998  2020    1   27  2020-01-27
2533    312.600006  2020    1   28  2020-01-28
2534    324.450012  2020    1   29  2020-01-29
2535    320.540009  2020    1   30  2020-01-30
2536    320.929993  2020    1   31  2020-01-31
2537    304.299988  2020    2   3   2020-02-03
2538    315.309998  2020    2   4   2020-02-04
2539    323.519989  2020    2   5   2020-02-05
2540    322.570007  2020    2   6   2020-02-06
2541    322.369995  2020    2   7   2020-02-07
2542    314.179993  2020    2   10  2020-02-10
2543    323.600006  2020    2   11  2020-02-11
2544    321.470001  2020    2   12  2020-02-12
2545    324.190002  2020    2   13  2020-02-13
2546    324.739990  2020    2   14  2020-02-14
2547    315.359985  2020    2   18  2020-02-18
2548    320.000000  2020    2   19  2020-02-19
2549    322.630005  2020    2   20  2020-02-20
2550    318.619995  2020    2   21  2020-02-21
2551    297.260010  2020    2   24  2020-02-24
2552    300.950012  2020    2   25  2020-02-25
2553    286.529999  2020    2   26  2020-02-26
2554    281.100006  2020    2   27  2020-02-27
2555    257.260010  2020    2   28  2020-02-28

If I use an actual value for the equals too, I get the results I want. But when I try use that value based on the range loop value, it breaks.

Good

tempYearDF.loc[tempYearDF['month'] == 1, 'open']

2516    296.239990
2517    297.149994
2518    293.790009
2519    299.839996
2520    297.160004
2521    307.239990
2522    310.600006
2523    311.640015

Upvotes: 3

Views: 4669

Answers (3)

VISQL
VISQL

Reputation: 2048

for month in range(1,13):
    tempOpenDF = tempYearDF.loc[tempYearDF['month'] == month, 'open']

loc = location, or "named" items

You might want iloc, but also, isn't tempYearDF['month'] an entire column?
You might want to refer to tempYearDF['month'].value or tempYearDF['month'].the_name_of_this_column (or whatever the appropriate method/attribute is).


df[df["month"] ==1] is a slice with 21 rows and all the columns df.loc[df["month"] ==1] is also a slice with 21 rows and all the columns 'df.loc[df["month"] ==1, "open" does return the 21 rows just in the open column when month equals 1.

Where are you saving this too? tempOpenDF is within the for loop. Its value will just change with every index of the loop.

I would have to see more of where this is passed to. As it stands you filter correctly, but sending that filtered data to nowhere.

What you have works otherwise.

import pandas as pd
df = pd.read_csv("sample_data.csv",sep='\t',parse_dates=["date"])
# sample data is what you provided above, using tab separation
#

some_year = 2020
print(df.loc[df["month"] == 1, 'open'],'\n')
print(df.loc[df["year"] == 2020, 'open'],'\n')
# print(df.loc[(df["month"] == 1 and df["year"] == 2020), 'open'])

for i in range(1,13):
    dfy = df.loc[df["year"] == 2020]
    mondata = dfy.loc[dfy["month"] == i, "open"]
    print("Month: ",i,'\n',mondata,"\n")

>>> df.head()
some_index open year month day date
0 2516 296.239990 2020 1 2 2020-01-02
1 2517 297.149994 2020 1 3 2020-01-03
2 2518 293.790009 2020 1 6 2020-01-06
3 2519 299.839996 2020 1 7 2020-01-07
4 2520 297.160004 2020 1 8 2020-01-08
true index is 0,1, etc. some_index came from your data.

Upvotes: 0

Marcus
Marcus

Reputation: 320

Sample data frame:

     0     1  2
0  123  2020  1
1  234  2020  2
2  543  2020  1
# For all unique years
for y in df[1].unique():
    # For all unique months
    for m in df[2].unique():
        # Get the row based on the month
        row = df.loc[df[2] == m]
            # Print only the desired column
            print(row[0])

Output:

0    123
2    543
Name: 0, dtype: int64
1    234
Name: 0, dtype: int64

Upvotes: 0

gold_cy
gold_cy

Reputation: 14216

Can't you just group by the year and month and then proceed from there?

for _, v in df.groupby(['year', 'month'])['open']:
    tempOpenDF = v
    # do stuff

Upvotes: 1

Related Questions