Reputation: 31
I have a DataFrame of weather data that I've compiled from a csv file byte string. Each line includes a month of data for one element, i.e. temp max in June 1949 is in the line where Year = 1949, Month = 06, Element = TMAX. The temp value for the 1st June is value1, 2nd June is value2 and so on.
I've copied to clipboard a snippet of the df and pasted below, you should be able to copy and use pd.read_clipboard(sep=',')
to recreate this df.
What I need to do is extract the values for each day of the month (i'm not too worried about the mflag, sflag, qflag at this stage), attach the element the values refer to and make a new dataframe that looks like this:
Date | TMAX | PRCP |
---|---|---|
1949-06-01 | 210 | 24 |
1949-06-02 | 189 | 36 |
,ID,year,month,element,value1,mflag1,qflag1,sflag1,value2,mflag2,qflag2,sflag2,value3,mflag3,qflag3,sflag3,value4,mflag4,qflag4,sflag4,value5,mflag5,qflag5,sflag5,value6,mflag6,qflag6,sflag6,value7,mflag7,qflag7,sflag7,value8,mflag8,qflag8,sflag8,value9,mflag9,qflag9,sflag9,value10,mflag10,qflag10,sflag10,value11,mflag11,qflag11,sflag11,value12,mflag12,qflag12,sflag12,value13,mflag13,qflag13,sflag13,value14,mflag14,qflag14,sflag14,value15,mflag15,qflag15,sflag15,value16,mflag16,qflag16,sflag16,value17,mflag17,qflag17,sflag17,value18,mflag18,qflag18,sflag18,value19,mflag19,qflag19,sflag19,value20,mflag20,qflag20,sflag20,value21,mflag21,qflag21,sflag21,value22,mflag22,qflag22,sflag22,value23,mflag23,qflag23,sflag23,value24,mflag24,qflag24,sflag24,value25,mflag25,qflag25,sflag25,value26,mflag26,qflag26,sflag26,value27,mflag27,qflag27,sflag27,value28,mflag28,qflag28,sflag28,value29,mflag29,qflag29,sflag29,value30,mflag30,qflag30,sflag30,value31,mflag31,qflag31,sflag31
14,40223,1945,12,TAVG, 244,H, ,S, 228,H, ,S, 239,H, ,S, 231,H, ,S, 195,H, ,S, 216,H, ,S, 232,H, ,S, 230,H, ,S, 253,H, ,S, 260,H, ,S, 251,H, ,S, 260,H, ,S, 256,H, ,S, 257,H, ,S, 252,H, ,S, 256,H, ,S, 253,H, ,S, 241,H, ,S, 209,H, ,S, 226,H, ,S, 229,H, ,S, 239,H, ,S, 239,H, ,S, 234,H, ,S, 233,H, ,S, 232,H, ,S, 234,H, ,S, 231,H, ,S, 232,H, ,S, 232,H, ,S, 241,H, ,S
15,40223,1946,01,TAVG, 249,H, ,S, 259,H, ,S, 256,H, ,S, 271,H, ,S, 277,H, ,S, 264,H, ,S, 231,H, ,S, 236,H, ,S, 235,H, ,S, 249,H, ,S, 257,H, ,S, 256,H, ,S, 250,H, ,S, 253,H, ,S, 261,H, ,S, 262,H, ,S, 257,H, ,S, 264,H, ,S, 264,H, ,S, 264,H, ,S, 272,H, ,S, 274,H, ,S,-9999, , , ,-9999, , , ,-9999, , , ,-9999, , , ,-9999, , , ,-9999, , , ,-9999, , , ,-9999, , , ,-9999, , ,
16,40223,1949,06,TMAX,-9999, , , ,-9999, , , , 210, , ,a, 146, , ,a, 189, , ,a, 216, , ,a, 211, , ,a, 210, , ,a, 214, , ,a, 174, , ,a, 225, , ,a, 187, , ,a, 168, , ,a, 207, , ,a, 218, , ,a, 205, , ,a, 174, , ,a, 192, , ,a, 186, , ,a, 199, , ,a, 176, , ,a, 141, , ,a, 173, , ,a, 190, , ,a, 189, , ,a, 197, , ,a, 207, , ,a, 216, , ,a, 202, , ,a, 204, , ,a,-9999, , ,
17,40223,1949,06,TMIN,-9999, , , ,-9999, , , , 136, , ,a, 60, , ,a, 113, , ,a, 84, , ,a, 91, , ,a, 88, , ,a, 126, , ,a, 131, , ,a, 129, , ,a, 84, , ,a, 81, , ,a, 113, , ,a, 110, , ,a, 133, , ,a, 136, , ,a, 108, , ,a, 111, , ,a, 122, , ,a, 107, , ,a, 81, , ,a, 59, , ,a, 57, , ,a, 44, , ,a, 42, , ,a, 66, , ,a, 77, , ,a, 60, , ,a, 68, , ,a,-9999, , ,
18,40223,1949,06,PRCP, 277, , ,a, 239, , ,a, 0, , ,a, 0, , ,a, 0, , ,a, 0, , ,a, 0, , ,a, 0, , ,a, 3, , ,a, 53, , ,a, 0, , ,a, 0, , ,a, 0, , ,a, 8, , ,a, 3, , ,a, 3, , ,a, 71, , ,a, 107, , ,a, 0, , ,a, 0, , ,a, 0, , ,a, 0, , ,a, 0, , ,a, 0, , ,a, 0, , ,a, 0, , ,a, 0, , ,a, 0, , ,a, 0, , ,a, 0, , ,a,-9999, , ,
19,40223,1949,06,TAVG, 156,H, ,S, 174,H, ,S, 136,H, ,S, 129,H, ,S, 146,H, ,S, 153,H, ,S, 146,H, ,S, 165,H, ,S, 164,H, ,S, 154,H, ,S, 156,H, ,S, 126,H, ,S, 144,H, ,S, 153,H, ,S, 169,H, ,S, 167,H, ,S, 151,H, ,S, 154,H, ,S, 153,H, ,S, 150,H, ,S, 132,H, ,S, 102,H, ,S, 106,H, ,S, 109,H, ,S, 113,H, ,S, 121,H, ,S, 133,H, ,S, 132,H, ,S, 132,H, ,S, 129,H, ,S,-9999, , ,
Thanks in advance
Upvotes: 2
Views: 139
Reputation: 9857
There's probably a more straightforward way to do what you want, but this code will work, and can be followed step by step.
Note, I saved the data in a file named weather_data.csv
- using the clipboard wasn't quite working after a while.:)
import pandas as pd
# df = pd.read_clipboard(sep=',')
# df.to_csv('weather_data.csv')
df = pd.read_csv('weather_data.csv')
keep_columns = [col for col in df.columns[2:] if not 'flag' in col]
df = df[keep_columns]
df_melt = pd.melt(df, id_vars=['year', 'month', 'element'], value_vars=keep_columns[3:])
df_melt['day'] = df_melt['variable'].str[5:].astype(int)
df_melt = df_melt.query('value!=-9999')
df_melt['date'] = pd.to_datetime(df_melt[['year', 'month', 'day']])
df_final = df_melt[['date', 'element', 'value']].pivot(index='date', columns='element')
print(df_final)
Sample output:
value
element PRCP TAVG TMAX TMIN
date
1945-12-01 NaN 244.0 NaN NaN
1945-12-02 NaN 228.0 NaN NaN
1945-12-03 NaN 239.0 NaN NaN
1945-12-04 NaN 231.0 NaN NaN
1945-12-05 NaN 195.0 NaN NaN
... ... ... ... ...
1949-06-26 0.0 121.0 197.0 42.0
1949-06-27 0.0 133.0 207.0 66.0
1949-06-28 0.0 132.0 216.0 77.0
1949-06-29 0.0 132.0 202.0 60.0
1949-06-30 0.0 129.0 204.0 68.0
Upvotes: 3
Reputation: 18416
Well, that's really a tedious task in terms writing the code to break down the structure you have.
Start by pivoting the dataframe you have passing ID
, year
, and month
as index, element
as column, and columns like value1, value2,...
as values (you can use DataFrame.filter
to get these values columns), then stack at level 0, take only PRCP
and TMAX
columns, then reset index at level -1.
Then extract the day values from level_3, (you can use regex value(\d+)
to get day values, then reset the index at level 1 and 2.
Finally get the Date
value out of year
, month
, and day
columns.
out=df.pivot(['ID', 'year', 'month'],
['element'],
df.filter(like='value').columns).stack(0)[['PRCP', 'TMAX']].reset_index(-1)
out['day']=out['level_3'].str.extract('value(\d+)')
out = out.reset_index([1,2])
out['Date'] = pd.to_datetime((out['year']
.astype(str)
+out['month']
.astype(str).str.zfill(2)
+out['day'].astype(str).str.zfill(2)),
format='%Y%m%d', errors='coerce')
OUTPUT:
element year month level_3 PRCP TMAX day Date
ID
40223 1945 12 value1 NaN NaN 1 1945-12-01
40223 1945 12 value10 NaN NaN 10 1945-12-10
40223 1945 12 value11 NaN NaN 11 1945-12-11
40223 1945 12 value12 NaN NaN 12 1945-12-12
40223 1945 12 value13 NaN NaN 13 1945-12-13
... ... ... ... ... .. ...
40223 1949 6 value5 0.0 189.0 5 1949-06-05
40223 1949 6 value6 0.0 216.0 6 1949-06-06
40223 1949 6 value7 0.0 211.0 7 1949-06-07
40223 1949 6 value8 0.0 210.0 8 1949-06-08
40223 1949 6 value9 3.0 214.0 9 1949-06-09
[93 rows x 7 columns]
Upvotes: 2