Reputation: 1191
I'm trying to build a web app using Django where the user will upload some csv file, possibly a big one. Then the code will clean the file for bad data and then the user can use it to make queries with clean data.
Now I believe whenever a user will make a query, the whole code will run again which means it will start cleaning again and so on.
Question:
Is there any way that once the csv data in clean, it stays in memory and user can make queries to that clean data?
import pandas as pd
def converter(num):
try:
return float(num)
except ValueError:
try:
num = num.replace("-", '0.0').replace(',', '')
return float(num)
except ValueError:
return np.nan
def get_clean_data(request):
# Read the data from csv file:
df = pd.read_csv("data.csv")
# Clean the data and send JSON response
df['month'] = df['month'].str.split("-", expand=True)[1]
df[df.columns[8:]] = df[df.columns[8:]].astype(str).applymap(converter)
selected_year = df[df["Departure Datetime: Year (YYYY)"] == 2015]
data_for_user = (selected_year.groupby(
by="route").sum().sort_values(by="revenue").to_json()
return JsonResponse(data_for_user, safe=False)
Upvotes: 3
Views: 998
Reputation: 22994
One way to achieve this could be to cache the dataframe in memory after it has been cleaned. Subsequent requests could then use the cleaned version from the cache.
from django.core.cache import cache
def get_clean_data(request):
# Check the cache for cleaned data
df = cache.get('cleaned_data')
if df is None:
# Read the data from csv file:
df = pd.read_csv("data.csv")
# Clean the data
df['month'] = df['month'].str.split("-", expand=True)[1]
df[df.columns[8:]] = df[df.columns[8:]].astype(str).applymap(converter)
# Put it in the cache
cache.set('cleaned_data', df, timeout=600)
selected_year = df[df["Departure Datetime: Year (YYYY)"] == 2015]
data_for_user = (selected_year.groupby(
by="route").sum().sort_values(by="revenue").to_json()
return JsonResponse(data_for_user, safe=False)
You'd need to be a little bit careful, because if the csv file is very large it may consume a large amount of memory when cached.
Django supports a number of different cache backends, from simple local memory caching, to more complex memcached caching.
Upvotes: 2