Reputation: 25
Sort data frame by values of 5th column ["Card"] and add a new row after each card nos with its count. After sorting values how can I add a new row with Total:
Dataframe looks something like this
This is how I want output data frame
Upvotes: 0
Views: 1892
Reputation: 25
I'm not sure if this is the best way but it worked for me.
list_of_df = [v for k, v in df.groupby("card")]
for i in enumerate(list_of_df):
list_of_df[i[0]] = list_of_df[i[0]].append({"card":str(list_of_df[i[0]].shape[0])}, ignore_index=True)
final_df = pd.concat(list_of_df)
Upvotes: 0
Reputation: 3128
You can give this a try:
import pandas as pd
# create dummy df
card = ["2222","2222","1111","2222","1111","3333"]
name = ["Ed", "Ed", "John", "Ed", "John", "Kevin"]
phone = ["1##-###-####", "1##-###-####", "2##-###-####", "1##-###-####", "2##-###-####", "3##-###-####"]
df = pd.DataFrame({"Name":name, "Phone":phone, "Card":card})
# sort by Card value
df = df.sort_values(by=["Card"]).reset_index(drop=True)
# Groupby the Card value, count them, then insert a new row based on that count
index = 0
line = []
for x in df.groupby("Card").size():
index += x
line.append(pd.DataFrame({"Name": "", "Phone":"", "Card": str(x)}, index=[index]))
df = df.append(line, ignore_index=False)
df = df.sort_values(by=["Card"]).sort_index().reset_index(drop=True)
df
Output:
Name Phone Card
0 Ed 1##-###-#### 1111
1 Ed 1##-###-#### 1111
2 Ed 1##-###-#### 1111
3 3
4 John 2##-###-#### 2222
5 John 2##-###-#### 2222
6 2
7 Kevin 3##-###-#### 3333
8 1
Due to OP's use of string for card numbers, an edit had to be made to account for naturally sorting string ints
import pandas as pd
from natsort import natsort_keygen ##### Now needed because OP has Card numbers as strings
# create dummy df ##############
card = ["1111", "2222", "3333", "4444", "5555", "6666", "7777", "8888"]
name = ["Ed", "John", "Jake", "Mike", "Liz", "Anne", "Deb", "Steph"]
phone = ["1###", "2###", "3###", "4###", "5###", "6###", "7###", "8###"]
dfList = [a for a in zip(name, phone, card)]
dfList = [dfList[random.randrange(len(dfList))] for i in range(50)]
df = pd.DataFrame(dfList, columns=["Name", "Phone", "Card"])
################################
# sort by Card value
df = df.sort_values(by=["Card"]).reset_index(drop=True)
# Groupby the Card value, count them, then insert a new row based on that count
index = 0
line = []
for x in df.groupby("Card").size():
index += x
line.append(pd.DataFrame({"Name": "", "Phone":"", "Card": str(x)}, index=[index-1]))
df = pd.concat([df, pd.concat(line)], ignore_index=False)
# Create an Index column to be used in the by pandas sort_values
df["Index"] = df.index
# Sort the values first by index then by card number, use "natsort_keygen()" to naturally sort ints that are strings
df = df.sort_values(by = ['Index', 'Card'], key=natsort_keygen(), ascending = [True, False]).reset_index(drop=True).drop(["Index"], axis=1)
Upvotes: 1