tktktk0711
tktktk0711

Reputation: 1694

Python3 pandas: data frame grouped by a columns(such as name), then extract a number of rows for each group

There is data frame called df as following:

name   id    age             text 
a      1     1    very good, and I like him
b      2     2    I play basketball with his brother
c      3     3    I hope to get a offer
d      4     4    everything goes well, I think
a      1     1    I will visit china
b      2     2    no one can understand me, I will solve it
c      3     3    I like followers
d      4     4    maybe I will be good
a      1     1    I should work hard to finish my research
b      2     2    water is the source of earth, I agree it
c      3     3    I hope you can keep in touch with me
d      4     4    My baby is very cute, I like him

The data frame is grouped by name, then I want to extract a number of rows by row index(for example: 2) for the new dataframe: df_new.

name   id    age             text 
a      1     1    very good, and I like him
a      1     1    I will visit china
b      2     2    I play basketball with his brother
b      2     2    no one can understand me, I will solve it
c      3     3    I hope to get a offer
c      3     3    I like followers
d      4     4    everything goes well, I think
d      4     4    maybe I will be good



  df_new = (df.groupby('screen_name'))[0:2]

But there is error:

   hash(key)
  TypeError: unhashable type: 'slice'

Upvotes: 0

Views: 56

Answers (2)

jezrael
jezrael

Reputation: 862661

Another solution with iloc:

df_new = df.groupby('name').apply(lambda x: x.iloc[:2]).reset_index(drop=True)
print(df_new)
  name  id  age                                       text
0    a   1    1                  very good, and I like him
1    a   1    1                         I will visit china
2    b   2    2         I play basketball with his brother
3    b   2    2  no one can understand me, I will solve it
4    c   3    3                      I hope to get a offer
5    c   3    3                           I like followers
6    d   4    4              everything goes well, I think
7    d   4    4                       maybe I will be good

Upvotes: 1

Bob Haffner
Bob Haffner

Reputation: 8493

Try using head() instead.

import pandas as pd
from io import StringIO

buff = StringIO('''
name,id,age,text
a,1,1,"very good, and I like him"
b,2,2,I play basketball with his brother
c,3,3,I hope to get a offer
d,4,4,"everything goes well, I think"
a,1,1,I will visit china
b,2,2,"no one can understand me, I will solve it"
c,3,3,I like followers
d,4,4,maybe I will be good
a,1,1,I should work hard to finish my research
b,2,2,"water is the source of earth, I agree it"
c,3,3,I hope you can keep in touch with me
d,4,4,"My baby is very cute, I like him"
''')
df = pd.read_csv(buff)

using head() instead of [:2] then sorting by name

df_new = df.groupby('name').head(2).sort_values('name')
print(df_new)
  name  id  age                                       text
0    a   1    1                  very good, and I like him
4    a   1    1                         I will visit china
1    b   2    2         I play basketball with his brother
5    b   2    2  no one can understand me, I will solve it
2    c   3    3                      I hope to get a offer
6    c   3    3                           I like followers
3    d   4    4              everything goes well, I think
7    d   4    4                       maybe I will be good

Upvotes: 1

Related Questions