Alex Rodrigues
Alex Rodrigues

Reputation: 55

Summarize pandas DataFrame

I want to summarize pandas DataFrame file. This one is like this ->

City      Name      Date
London    Joey      1998
Vegas     Chandler  1999

So result will be like this. In 1998 Joey was in London. In 1999 Chandler was in Vegas. Something like this. Is there any workaround ? or which modules help me in doing this ? Thank you.

Upvotes: 0

Views: 266

Answers (3)

DapperDuck
DapperDuck

Reputation: 2876

With this csv:

City,Name,Date
London,Joey,1998
Vegas,Chandler,1999

You can use the following code:

import pandas as pd

df = pd.read_csv("test.csv")
for i in range(len(df)):
    print(f"In {df.iloc[i,2]}, {df.iloc[i,0]} was in {df.iloc[i,1]}.")

It iterates over the rows of the dataframe, and uses a format prefix in the string to print the sentence with values from each row using df.iloc.

Upvotes: 2

Rahul Trivedi
Rahul Trivedi

Reputation: 124

Given data

df=pd.DataFrame(data=[['London','Joey','1998'],['Vegas','Chandler','1999']],columns=['City','Name','Date'])

Add a new column as a summary if you need or you can have it in the list as @gmdev suggested.

df['Summary']=df.apply(lambda x: 'In '+str(x.Date)+' '+str(x.Name)+' was in '+str(x.City),axis=1)

enter image description here

Upvotes: 1

gmdev
gmdev

Reputation: 3155

To solve this, you really only need to iterate over the rows. Using a list comprehension is going to be faster than using `iterrows:

If you want to modify DataFrame:

Here, we use DataFrame.apply to, well, "apply" the function over each row:

def format_row(row):
    return f"In {row['Date']}, {row['Name']} was in {row['City']}."

df = df.apply(lambda r: format_row(r), axis=1)
print(df)

Output:

0       In 1998, Joey was in London.
1    In 1999, Chandler was in Vegas.

If you want to get sentences as list:

You can define a function that formats the row, like so:

def format_row(row):
    return f"In {row[0]}, {row[1]} was in {row[2]}."

And then use the list comprehension to zip the row accordingly and pass it to the function.

rows = [format_row(r) for r in zip(df["Date"], df["Name"], df["City"])]

If these columns are the only columns in the DataFrame, using DataFrame.values will be cleaner, but provide the same output:

rows = [format_row(r) for r in df.values]

In that case, you'll have to swap the index values within the function:

return f"In {row[2]}, {row[1]} was in {row[0]}."

Upvotes: 1

Related Questions