Reputation: 1694
There is dataframe called as df as following:
name id age text
a 1 1 very good, and I like him
b 2 2 I play basketball with his brother
c 3 3 I hope to get a offer
d 4 4 everything goes well, I think
a 1 1 I will visit china
b 2 2 no one can understand me, I will solve it
c 3 3 I like followers
d 4 4 maybe I will be good
a 1 1 I should work hard to finish my research
b 2 2 water is the source of earth, I agree it
c 3 3 I hope you can keep in touch with me
d 4 4 My baby is very cute, I like him
You know, there are four names: a, b, c, d. and each name has id, age, and text. Actually there id, age for each name group are the same, but the text is different for each name group, each name has three rows(this just example, the real data is a large data)
I want to get the id, age for each name group (for example). In addition, I want to caculate the character index in all text for each group in the text by the function: extract_text(text). I mean I want to get the following data: take the name 'a' as example: age: 1, id: 1. 'I' index in three rows(I just give a example, not the real): 20, 0, 0.
I have tried to do as following:
import pandas as pd
def extract_text(text):
index_n = None
text_len = len(text)
for i in range(0, text_len, 1):
if text[i] == 'I':
index_n = i
return index_n
df = pd.DataFrame({'name': ['a', 'b', 'c', 'd', 'a', 'b', 'c', 'd',
'a', 'b', 'c', 'd'],
'id': [1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4],
'age':[1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4],
'text':['very good, and I like him',
'I play basketball with his brother',
'I hope to get a offer',
'everything goes well, I think',
'I will visit china',
'no one can understand me, I will solve it',
'I like followers', 'maybe I will be good',
'I should work hard to finish my research',
'water is the source of earth, I agree it',
'I hope you can keep in touch with me',
'My baby is very cute, I like him']})
id_num = df.groupby('name')['id'].value[0]
id_num = df.groupby('age')['id'].value[0]
index_num = df.groupby('age')['text'].apply(extract_text)
But there is error:
Traceback (most recent call last):File
bot_test_new.py", line 25, in
id_num = df.groupby('name')['id'].value[0]
AttributeError: 'SeriesGroupBy' object has no attribute 'value'
Please give me you hand, thanks in advance!
Upvotes: 2
Views: 3506
Reputation: 862511
I think you can use str.find
:
print (df.groupby('age')['text'].apply(lambda x: x.str.find('I').tolist()))
age
1 [15, 0, 0]
2 [0, 26, 30]
3 [0, 0, 0]
4 [22, 6, 22]
Name: text, dtype: object
If need id_num
use iloc
:
id_num = df.groupby('name')['id'].apply(lambda x: x.iloc[0])
print (id_num)
name
a 1
b 2
c 3
d 4
Name: id, dtype: int64
But it looks like you can use only:
df['position'] = df['text'].str.find('I')
print (df)
age id name text position
0 1 1 a very good, and I like him 15
1 2 2 b I play basketball with his brother 0
2 3 3 c I hope to get a offer 0
3 4 4 d everything goes well, I think 22
4 1 1 a I will visit china 0
5 2 2 b no one can understand me, I will solve it 26
6 3 3 c I like followers 0
7 4 4 d maybe I will be good 6
8 1 1 a I should work hard to finish my research 0
9 2 2 b water is the source of earth, I agree it 30
10 3 3 c I hope you can keep in touch with me 0
11 4 4 d My baby is very cute, I like him 22
Upvotes: 1
Reputation: 420
I'll elaborate a bit more than in the comment. The problem is that extract_text is only able to handle individual strings. However when you groupby and then apply, you're sending a list with all the strings in the group.
There are two solutions, the first is the one I indicated (sending individual strings):
index_num = df.groupby('age')['text'].apply(lambda x: [extract_text(_) for _ in x])
The other is changing extract_text so it can handle the list of strings:
def extract_text(list_texts):
list_index = []
for text in list_texts:
index_n = None
text_len = len(text)
for i in range(0, text_len, 1):
if text[i] == 'I':
index_n = i
list_index.append(index_n)
return list_index
And then continue with:
index_num = df.groupby('age')['text'].apply(extract_text)
Moreover, you can use text.find("I")
instead of your loop inside extract_text. Something like this def extract_text(list_texts): return [text.find("I") for text in list_texts]
.
Upvotes: 1