James Maguire
James Maguire

Reputation: 570

Pandas - Parsing Bank Statement

I have 2 dataframes - transactions and companies.

Transactions comes from a csv directly from the bank. There is a column "booking text" which has a long string with the details of the transaction, these strings vary depending on a number of factors but for the most part look sort of like this:

"Company Co. DERFTHD DE89758975869857 657878987 End-to-End-Ref:. FRG.3.GH.15789"

Companies is a list of companies with two columns - "search key" and "company name" which I have populated in order to search these "booking text" strings and return company names.

I want to create a new column in the transactions frame which uses the "search key" and contains the "company name"

EDIT: Sorry - I am a total newb at this. Let me try again.

So I have a dataframe of names

In [1]: df1 = pd.DataFrame([['cat','Bob'], ['dog','Joe'],['bird','Lary']], 
              columns=['A','B'])
Out[2]: df1
      A      B
0    cat    Bob
1    dog    Joe
2    bird   Lary

and a dataframe of strings (as well as other columns)

In [3]: df2 = pd.DataFrame([['the cat is big','2'],['the cat is small', 4],
              ['the dog is small',3]], columns=['C','D']
Out[4]:
            C            D
0    the cat is big      2
1    the cat is small    4
2    the dog is small    3

I want to be able to create a new column "E" in df2 which contains the "subject" of the string in column C based on the information in df1.

So "the cat is big" contains "cat" and so the value should be "Bob"

My desired output is this.

Out[5]: df3
             C           D     E
0    the cat is big      2    Bob
1    the cat is small    4    Bob
2    the dog is small    3    Joe

Upvotes: 0

Views: 1713

Answers (1)

d swamy
d swamy

Reputation: 106

create a dictionary for df1 instead of a dataframe.

df1_dict = {'cat':'Bob', 'dog':'Joe', 'bird':'Lary' }

for key, value in df1_dict.iteritems(): df2.loc[df2['C'].str.contains(key),'E'] = value

print df2

Upvotes: 3

Related Questions