how to extract certain rows with a condition?

Question

I am working with a dataset that contains in its first column, emotion or category labels. However, since the dataset is unbalanced, I need to extract the same number of rows for each category. That is, if there are 10 categories, I need to select only 100 rows samples from each of those categories. The result would be 1000 rows samples.

What I tried:

def append_new_rows(df, new_df, s):
    c = 0
    for index, row in df.iterrows():
        if s == row[0]:
            if c <= 100:
                new_df.append(row)
                c += 1
    return df_2

for s in sorted(list(set(df.category))):
    new_df = append_new_rows(df, new_df, s)

Dataset

----------------------------
| category | A  | B  | C | D |
----------------------------
| happy    | ...| ...|...|...|
| ...      | ...| ...|...|...|
| sadness  | ...| ...|...|...|

Expected output

----------------------------
| category | A  | B  | C | D |
----------------------------
| happy    | ...| ...|...|...|
... 100 samples of happy
| ...      | ...| ...|...|...|
| sadness  | ...| ...|...|...|
... 100 samples of sadness
...
...
1000 sampple rows

N34RM1K Zx · Accepted Answer

def append_new_df(df, df_2, s, n):
    c = 1
    for index, row in df.iterrows():
        if s == row[0]:
            if c <= n:
                df_2 = df_2.append(row)
                c += 1
    return df_2

you are just there, you just need to do something like this

how to extract certain rows with a condition?

What I tried:

Dataset

Expected output

Answers (1)

Related Questions