1974sb
1974sb

Reputation: 69

Generating sub data frame based on a value in an column

I have following data frame in pandas. Now I want to generate sub data frame if I see a value in Activity column. So for example, I want to have data frame with all the data with Name A IF Activity column as value 3 or 5.

Name    Date           Activity
A       01-02-2015       1
A       01-03-2015       2
A       01-04-2015       3
A       01-04-2015       1
B       01-02-2015       1
B       01-02-2015       2
B       01-03-2015       1
B       01-04-2015       5
C       01-31-2015       1
C       01-31-2015       2
C       01-31-2015       2

So for the above data, I want to get df_A as

Name    Date           Activity
A       01-02-2015       1
A       01-03-2015       2
A       01-04-2015       3
A       01-04-2015       1

df_B as

B       01-02-2015       1
B       01-02-2015       2
B       01-03-2015       1
B       01-04-2015       5

Since Name C does not have 3 or 5 in the column Activity, I do not want to get this data frame.

Also, the names in the data frame can vary with each input file.

Once I have these data frame separated, I want to plot a time series.

Upvotes: 3

Views: 2819

Answers (2)

Alexander
Alexander

Reputation: 109528

You can use a dictionary comprehension to create a sub dataframe for each Name with an Activity value of 3 or 5.

active_names = df[df.Activity.isin([3, 5])].Name.unique().tolist()
dfs = {name: df.loc[df.Name == name, :] for name in active_names}

>>> dfs['A']
  Name        Date  Activity
0    A  01-02-2015         1
1    A  01-03-2015         2
2    A  01-04-2015         3
3    A  01-04-2015         1

>>> dfs['B']
  Name        Date  Activity
4    B  01-02-2015         1
5    B  01-02-2015         2
6    B  01-03-2015         1
7    B  01-04-2015         5

Upvotes: 1

jezrael
jezrael

Reputation: 862511

You can groupby dataframe by column Name, apply custom function f and then select dataframes df_A and df_B:

print df
   Name       Date  Activity
0     A 2015-01-02         1
1     A 2015-01-03         2
2     A 2015-01-04         3
3     A 2015-01-04         1
4     B 2015-01-02         1
5     B 2015-01-02         2
6     B 2015-01-03         1
7     B 2015-01-04         5
8     C 2015-01-31         1
9     C 2015-01-31         2
10    C 2015-01-31         2


def f(df):
    if ((df['Activity'] == 3) | (df['Activity'] == 5)).any():
        return df

g = df.groupby('Name').apply(f).reset_index(drop=True)
df_A = g.loc[g.Name == 'A']
print df_A
  Name       Date  Activity
0    A 2015-01-02         1
1    A 2015-01-03         2
2    A 2015-01-04         3
3    A 2015-01-04         1

df_B = g.loc[g.Name == 'B']
print df_B
  Name       Date  Activity
4    B 2015-01-02         1
5    B 2015-01-02         2
6    B 2015-01-03         1
7    B 2015-01-04         5

df_A.plot()
df_B.plot()

In the end you can use plot - more info

EDIT:

If you want create dataframes dynamically, use can find all unique values of column Name by drop_duplicates:

for name in g.Name.drop_duplicates():
    print g.loc[g.Name == name]

      Name       Date  Activity
0    A 2015-01-02         1
1    A 2015-01-03         2
2    A 2015-01-04         3
3    A 2015-01-04         1
  Name       Date  Activity
4    B 2015-01-02         1
5    B 2015-01-02         2
6    B 2015-01-03         1
7    B 2015-01-04         5

Upvotes: 3

Related Questions