hula-hula
hula-hula

Reputation: 119

AttributeError: ("'str' object has no attribute 'str'", 'occurred at index 31978')

I am quite new to panda (just a few days getting my hands on it), although I am still in progress to learn and to explore on using the Pandas. I have a big size of csv file consist of hundred thousands of rows. My objective is to concatenate multiple row into a single row based on multiple column. On top of that, by referring the date/time as well which later need to be included. Below illustrate my csv file.

       Body                      UDH               Original Sender ID           Received Date/Time
Hi John, Can You            ABC0010101                  GGQMS                   01/02/2001 01:03:19
Wait A moment?              ABC0010102                  GGQMS                   01/02/2001 01:03:20
Whats is                    050004000111              112233445566              01/03/2001 11:16:01
Carrine Doing               050004000112              112233445566              01/03/2001 11:16:01
Over There?                 050004000113              112233445566              01/03/2001 11:16:02
Where is                    CD10F1011                   zwerty                  01/03/2001 15:22:10
Your Homework?              CD10F1012                   zwerty                  01/03/2001 15:22:11
Order for Pizza             AACCDD55001               112233445566              01/04/2001 19:20:21
Now for cheap $.            AACCDD55002               112233445566              01/04/2001 19:20:22
John, you know              G0500781                    GGQMS                   01/04/2001 10:21:21
Where can I get it?         G0500782                    GGQMS                   01/04/2001 10:21:21

As you can see above is my csv file. The UDH here act as the primary key, as per number of characters (first until second last) is where we can identify the body is belongs to. Another part is the Received Date/Time where second part of body received a 1 second late or maybe more than 1 second.

I've managed to concatenation the body, however, certain body consist of third part which I didn't manage to concat the body altogether.

Below is my current codes:

 def problem3():
    filep2 = pd.read_csv(r'/Users/John/Downloads/Practice1/my_r.csv')

    #data cleaning
    filep2['Received Date/Time']= filep2['Received Date/Time'].astype('datetime64[ns]')
    filep2['UDH']=filep2['UDH'].astype(object)
    filep2['Original Sender ID']=filep2['Original Sender ID'].astype(object)
    filep2['Account User Name']=filep2['Account User Name'].astype(object)
    filep2['Body']=filep2['Body'].astype(str)
    filep2['UDH']=filep2['UDH'].str.strip()
    df = pd.DataFrame(filep2)

    #Filter null row in UDH column
    df=df[df['UDH'].notnull()]
    df=df.sort_values(by ='UDH')

    df['Body'] = df.apply(multiple_condition, axis=1)    
    df.to_csv(r'/Users/John/Downloads/Practice1/my_c.csv', index=False, header=True) 

def multiple_condition (df):
    if (df['UDH'].str.len() == 8):
         df=df.groupby(df[['UDH'].str[:7],'Original Sender ID','Received Date/Time'])['Body'].apply(' '.join).reset_index()
         return df
    elif (df['UDH'].str.len() == 9):
         df= df.groupby(df[['UDH'].str[:8],'Original Sender ID','Received Date/Time'])['Body'].apply(' '.join).reset_index() 
         return df
    elif (df['UDH'].str.len() == 10):
         df= df.groupby(df[['UDH'].str[:9],'Original Sender ID','Received Date/Time'])['Body'].apply(' '.join).reset_index()
         return df
    elif (df['UDH'].str.len() == 11):
         df=df.groupby(df[['UDH'].str[:10],'Original Sender ID','Received Date/Time'])['Body'].apply(' '.join).reset_index() 
         return df
    elif (df['UDH'].str.len() == 12):
         df=df.groupby(df[['UDH'].str[:11],'Original Sender ID','Received Date/Time'])['Body'].apply(' '.join).reset_index() 
         return df

Above code gave the error as stated as the subject of this topic/ticket. Error message stated as follows;

UPDATED ERROR MESSAGE

  Traceback (most recent call last):

  File "<ipython-input-85-8ca58b5f49ad>", line 1, in <module>
    runfile('/Users/syafiq/Downloads/RoutingPractice01.py', wdir='/Users/syafiq/Downloads')

  File "/Users/John/opt/anaconda3/lib/python3.7/site-packages/spyder_kernels/customize/spydercustomize.py", line 827, in runfile
    execfile(filename, namespace)

  File "/Users/John/opt/anaconda3/lib/python3.7/site-packages/spyder_kernels/customize/spydercustomize.py", line 110, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "/Users/John/Downloads/RoutingPractice01.py", line 79, in <module>
    problem3()

  File "/Users/John/Downloads/RoutingPractice01.py", line 35, in problem3
    filep2['Received Date/Time']= filep2['Received Date/Time'].astype('datetime64[ns]')

  File "/Users/John/opt/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py", line 2980, in __getitem__
    indexer = self.columns.get_loc(key)

  File "/Users/John/opt/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2899, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))

  File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc

  File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc

  File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item

  File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item

KeyError: 'Received Date/Time'

Below is the desired output:

                Body                   Original Sender ID              Received Date/Time
Hi John, Can You Wait A Moment?             GGQMS                      01/02/2001 01:03:20
What Is carbine doing over there?        112233445566                  01/03/2001 11:16:02
Where is your homework?                    zwerty                      01/03/2001 15:22:11
Order for Pizza Now for cheap $          112233445566                  01/04/2001 19:20:22
John, you know where can I get it?          GGQMS                      01/04/2001 10:21:21

Noted: I've try multiple way on to obtain the desired output above, still couldn't resolve/error occurred. I've tried countless time with different approach, still no dice, keep hitting the brick wall. The UDH is the Identifier for group the body

I'm still new to Pandas and haven't get my hands dirty with Python in a while. I do appreciate if somebody could highlight where did I do wrong. And as well would be much grateful with your help on obtaining my desired output.

Thank you very much and very appreciated! :)

Upvotes: 1

Views: 339

Answers (1)

furas
furas

Reputation: 142631

I can get (more or less) expected result without apply() but using directly groupby()

groups = df.groupby([df['UDH'].str[:-1], 'Original Sender ID'])

df2 = groups.agg({'Body':' '.join, 'Received Date/Time':max}).reset_index()

I use io.StringIO() only to simulate file.

text = '''       Body                      UDH               Original Sender ID           Received Date/Time
Hi John, Can You            ABC0010101                  GGQMS                   01/02/2001 01:03:19
Wait A moment?              ABC0010102                  GGQMS                   01/02/2001 01:03:20
Whats is                    050004000111              112233445566              01/03/2001 11:16:01
Carrine Doing               050004000112              112233445566              01/03/2001 11:16:01
Over There?                 050004000113              112233445566              01/03/2001 11:16:02
Where is                    CD10F1011                   zwerty                  01/03/2001 15:22:10
Your Homework?              CD10F1012                   zwerty                  01/03/2001 15:22:11
Order for Pizza             AACCDD55001               112233445566              01/04/2001 19:20:21
Now for cheap $.            AACCDD55002               112233445566              01/04/2001 19:20:22
John, you know              G0500781                    GGQMS                   01/04/2001 10:21:21
Where can I get it?         G0500782                    GGQMS                   01/04/2001 10:21:21'''

import pandas as pd
import io

df = pd.read_csv(io.StringIO(text), sep='\s{2,}')

#df['Received Date/Time'] = df['Received Date/Time'].astype('datetime64[ns]')
#df['UDH'] = df['UDH'].astype(object)
#df['Original Sender ID'] = df['Original Sender ID'].astype(object)
#df['Account User Name'] = df['Account User Name'].astype(object)
#df['Body'] = df['Body'].astype(str)
#df['UDH'] = df['UDH'].str.strip()

#Filter null row in UDH column
#df = df[df['UDH'].notnull()]
#df = df.sort_values(by ='UDH')

#groups = df.groupby([df['UDH'].str[:-1], 'Original Sender ID'])
#for name, data in groups:
    #print(name)
#    data['Received Date/Time'] = data['Received Date/Time'].min()
    #print(data)

groups = df.groupby([df['UDH'].str[:-1], 'Original Sender ID'])
df2 = groups.agg({'Body':' '.join, 'Received Date/Time':max}).reset_index()

#groups = df.groupby([df['UDH'].str[:-1]])
#df2 = groups.agg({'Body':' '.join, 'Received Date/Time':max, 'Original Sender ID':min}).reset_index()

df2 = df2.sort_values('Received Date/Time')

pd.options.display.width = 200
print(df2)

Result

           UDH Original Sender ID                                Body   Received Date/Time
2    ABC001010              GGQMS     Hi John, Can You Wait A moment?  01/02/2001 01:03:20
0  05000400011       112233445566  Whats is Carrine Doing Over There?  01/03/2001 11:16:02
3     CD10F101             zwerty             Where is Your Homework?  01/03/2001 15:22:11
4      G050078              GGQMS  John, you know Where can I get it?  01/04/2001 10:21:21
1   AACCDD5500       112233445566    Order for Pizza Now for cheap $.  01/04/2001 19:20:22

Upvotes: 1

Related Questions