Naga
Naga

Reputation: 311

Pandas dataframe split based list of column values

I have a large data set of regions, I want to split the datframe into multiple dataframes based on the list of regions.

Example:

regions         val1    val2
A                1        2
A                1        2
B                1        2
C                1        2
D                1        2
E                1        2
A                1        2

I want to split the above dataframe by grouping (A,E), (B,C,D)

DF1:
    regions         val1    val2
    A                1        2
    A                1        2
    E                1        2
    A                1        2

DF2:    
    B                1        2
    C                1        2
    D                1        2

I tried this by manually specifying df[(df['regions'] == 'A') | (df['regions'] == 'E')]. I want to avoid manually specifying these regions codes while creating the dataframes. I'm quite new to pandas. Is there anyway to do it?

Upvotes: 1

Views: 1272

Answers (1)

jezrael
jezrael

Reputation: 863791

You can create dictionary of DataFrame for avoid manually creating DataFrames with dictioanry comprehension and Series.isin and boolean indexing for filtering:

L =  [('A','E'), ('B','C','D')]

dfs = {'_'.join(x):df[df['regions'].isin(x)] for x in L}
print (dfs)
{'A_E':   regions  val1  val2
0       A     1     2
1       A     1     2
5       E     1     2
6       A     1     2, 'B_C_D':   regions  val1  val2
2       B     1     2
3       C     1     2
4       D     1     2}

For select each DataFrame use key:

print (dfs['A_E'])
  regions  val1  val2
0       A     1     2
1       A     1     2
5       E     1     2
6       A     1     2

print (dfs['B_C_D'])
  regions  val1  val2
2       B     1     2
3       C     1     2
4       D     1     2

Maanually solution is:

df1 = df[df['regions'].isin(('A','E'))]
print (df1)
  regions  val1  val2
0       A     1     2
1       A     1     2
5       E     1     2
6       A     1     2

df2 = df[df['regions'].isin(('B','C','D'))]
print (df2)
  regions  val1  val2
2       B     1     2
3       C     1     2
4       D     1     2

Upvotes: 3

Related Questions