AMayer
AMayer

Reputation: 425

Get the last column of a pd.dataFrame and add it to another pd.dataFrame

I have an Excel file that looks like this:

CompanyName    High Priority     QualityIssue
Customer1         Yes             User
Customer1         Yes             User
Customer2         No              User
Customer3         No              Equipment
Customer1         No              Neither
Customer3         No              User
Customer3         Yes             User
Customer3         Yes             Equipment
Customer4         No              User

I want to count how many time each instance in CompanyName appears per every type of QualityIssue and sort by the number of appearance descend.

For example, by using this code:

df.groupby(["CompanyName ", "QualityIssue"]).size().to_frame('Count')

I get:

Out:

CompanyName       QualityIssue    Count
Customer2         User            1
Customer1         Neither         1
Customer4         User            1
Customer1         User            2
Customer3         Equipment       2
Customer3         User            2

And then let's say that I have another copy of the above in memory.

What I want is to add the last column of the second query to the end of the first one (In reality it won't be a copy of it, it's just an example):

CompanyName       QualityIssue    Count1    Count2
Customer2         User            1        1
Customer1         Neither         1        1
Customer4         User            1        1
Customer1         User            2        2
Customer3         Equipment       2        2
Customer3         User            2        2      

The problem here is that if I do

df['Count'] 

it will not print only that column, it will print everything, just like doing

print df  

So I cannot find a way to get only the last column of the dataFrame to add it to another one.

Upvotes: 1

Views: 50

Answers (1)

piRSquared
piRSquared

Reputation: 294278

Quick and simple way using groupby and size

df.groupby(['CompanyName', 'QualityIssue']).size()

CompanyName  QualityIssue
Customer1    Neither         1
             User            2
Customer2    User            1
Customer3    Equipment       2
             User            2
Customer4    User            1
dtype: int64

Supposing we have another in memory

c1 = df.groupby(['CompanyName', 'QualityIssue']).size()
c2 = c1.copy()

Then use pd.concat

pd.concat([c1, c2], keys=['Count1', 'Count2']).unstack(0, fill_value=0)

                          Count1  Count2
CompanyName QualityIssue                
Customer1   Neither            1       1
            User               2       2
Customer2   User               1       1
Customer3   Equipment          2       2
            User               2       2
Customer4   User               1       1

reset_index if you want the indices back in the dataframe proper.

pd.concat([c1, c2], keys=['Count1', 'Count2']).unstack(0, fill_value=0) \
    .reset_index()

  CompanyName QualityIssue  Count1  Count2
0   Customer1      Neither       1       1
1   Customer1         User       2       2
2   Customer2         User       1       1
3   Customer3    Equipment       2       2
4   Customer3         User       2       2
5   Customer4         User       1       1

Upvotes: 1

Related Questions