Balkhrod
Balkhrod

Reputation: 101

Pandas - how to create a new dataframe from the columns and values of an old dataframe?

I have a CSV file in which I have tweets with the following column names: File, User, Date 1, month, day, Tweet, Permalink, Retweet count, Likes count, Tweet value, Language, Location.

I want to create a new data frame with tweets from certain cities. I can do it but only for the last city on my list (Girona). So it doesn't add all the rows. Here is my code:

import pandas as pd
import os

path_to_file = "populismo_merge.csv"

df = pd.read_csv(path_to_file, encoding='utf-8', sep=',')

values = df[df['Location'].str.contains("A Coruña",na=False)]
values = df[df['Location'].str.contains("Alava",na=False)]
values = df[df['Location'].str.contains("Albacete",na=False)]
values = df[df['Location'].str.contains("Alicante",na=False)]
values = df[df['Location'].str.contains("Almería",na=False)]
values = df[df['Location'].str.contains("Asturias",na=False)]
values = df[df['Location'].str.contains("Avila",na=False)]
values = df[df['Location'].str.contains("Badajoz",na=False)]
values = df[df['Location'].str.contains("Barcelona",na=False)]
values = df[df['Location'].str.contains("Burgos",na=False)]
values = df[df['Location'].str.contains("Cáceres",na=False)]
values = df[df['Location'].str.contains("Cádiz",na=False)]
values = df[df['Location'].str.contains("Cantabria",na=False)]
values = df[df['Location'].str.contains("Castellón",na=False)]
values = df[df['Location'].str.contains("Ceuta",na=False)]
values = df[df['Location'].str.contains("Ciudad Real",na=False)]
values = df[df['Location'].str.contains("Córdoba",na=False)]
values = df[df['Location'].str.contains("Cuenca",na=False)]
values = df[df['Location'].str.contains("Formentera",na=False)]
values = df[df['Location'].str.contains("Girona",na=False)]
values.to_csv(r'populismo_ciudad.csv', index = False)

Many thanks!!!

Upvotes: 0

Views: 45

Answers (2)

Quixotic22
Quixotic22

Reputation: 2924

You are overwriting the values variable each time. A more concise answer would be along the lines of.

values= df[df['LocationName'].isin(["A Coruña", "Alava", ......)]

Upvotes: 1

Corralien
Corralien

Reputation: 120559

Use isin:

import pandas as pd
import os

path_to_file = "populismo_merge.csv"

df = pd.read_csv(path_to_file, encoding='utf-8', sep=',')

cities = ['A Coruña', 'Alava', 'Albacete', 'Alicante', 'Almería', 
          'Asturias', 'Avila', 'Badajoz', 'Barcelona', 'Burgos', 
          'Cáceres', 'Cádiz', 'Cantabria', 'Castellón', 'Ceuta', 
          'Ciudad Real', 'Córdoba', 'Cuenca', 'Formentera', 'Girona']

values = df[df['Location'].isin(cities)]
values.to_csv(r'populismo_ciudad.csv', index = False)

Upvotes: 1

Related Questions