peachy85
peachy85

Reputation: 31

Extract daily values from row and make new row with date

I have a DataFrame of weather data that I've compiled from a csv file byte string. Each line includes a month of data for one element, i.e. temp max in June 1949 is in the line where Year = 1949, Month = 06, Element = TMAX. The temp value for the 1st June is value1, 2nd June is value2 and so on.

I've copied to clipboard a snippet of the df and pasted below, you should be able to copy and use pd.read_clipboard(sep=',') to recreate this df.

What I need to do is extract the values for each day of the month (i'm not too worried about the mflag, sflag, qflag at this stage), attach the element the values refer to and make a new dataframe that looks like this:

Date TMAX PRCP
1949-06-01 210 24
1949-06-02 189 36
,ID,year,month,element,value1,mflag1,qflag1,sflag1,value2,mflag2,qflag2,sflag2,value3,mflag3,qflag3,sflag3,value4,mflag4,qflag4,sflag4,value5,mflag5,qflag5,sflag5,value6,mflag6,qflag6,sflag6,value7,mflag7,qflag7,sflag7,value8,mflag8,qflag8,sflag8,value9,mflag9,qflag9,sflag9,value10,mflag10,qflag10,sflag10,value11,mflag11,qflag11,sflag11,value12,mflag12,qflag12,sflag12,value13,mflag13,qflag13,sflag13,value14,mflag14,qflag14,sflag14,value15,mflag15,qflag15,sflag15,value16,mflag16,qflag16,sflag16,value17,mflag17,qflag17,sflag17,value18,mflag18,qflag18,sflag18,value19,mflag19,qflag19,sflag19,value20,mflag20,qflag20,sflag20,value21,mflag21,qflag21,sflag21,value22,mflag22,qflag22,sflag22,value23,mflag23,qflag23,sflag23,value24,mflag24,qflag24,sflag24,value25,mflag25,qflag25,sflag25,value26,mflag26,qflag26,sflag26,value27,mflag27,qflag27,sflag27,value28,mflag28,qflag28,sflag28,value29,mflag29,qflag29,sflag29,value30,mflag30,qflag30,sflag30,value31,mflag31,qflag31,sflag31
14,40223,1945,12,TAVG,  244,H, ,S,  228,H, ,S,  239,H, ,S,  231,H, ,S,  195,H, ,S,  216,H, ,S,  232,H, ,S,  230,H, ,S,  253,H, ,S,  260,H, ,S,  251,H, ,S,  260,H, ,S,  256,H, ,S,  257,H, ,S,  252,H, ,S,  256,H, ,S,  253,H, ,S,  241,H, ,S,  209,H, ,S,  226,H, ,S,  229,H, ,S,  239,H, ,S,  239,H, ,S,  234,H, ,S,  233,H, ,S,  232,H, ,S,  234,H, ,S,  231,H, ,S,  232,H, ,S,  232,H, ,S,  241,H, ,S
15,40223,1946,01,TAVG,  249,H, ,S,  259,H, ,S,  256,H, ,S,  271,H, ,S,  277,H, ,S,  264,H, ,S,  231,H, ,S,  236,H, ,S,  235,H, ,S,  249,H, ,S,  257,H, ,S,  256,H, ,S,  250,H, ,S,  253,H, ,S,  261,H, ,S,  262,H, ,S,  257,H, ,S,  264,H, ,S,  264,H, ,S,  264,H, ,S,  272,H, ,S,  274,H, ,S,-9999, , , ,-9999, , , ,-9999, , , ,-9999, , , ,-9999, , , ,-9999, , , ,-9999, , , ,-9999, , , ,-9999, , , 
16,40223,1949,06,TMAX,-9999, , , ,-9999, , , ,  210, , ,a,  146, , ,a,  189, , ,a,  216, , ,a,  211, , ,a,  210, , ,a,  214, , ,a,  174, , ,a,  225, , ,a,  187, , ,a,  168, , ,a,  207, , ,a,  218, , ,a,  205, , ,a,  174, , ,a,  192, , ,a,  186, , ,a,  199, , ,a,  176, , ,a,  141, , ,a,  173, , ,a,  190, , ,a,  189, , ,a,  197, , ,a,  207, , ,a,  216, , ,a,  202, , ,a,  204, , ,a,-9999, , , 
17,40223,1949,06,TMIN,-9999, , , ,-9999, , , ,  136, , ,a,   60, , ,a,  113, , ,a,   84, , ,a,   91, , ,a,   88, , ,a,  126, , ,a,  131, , ,a,  129, , ,a,   84, , ,a,   81, , ,a,  113, , ,a,  110, , ,a,  133, , ,a,  136, , ,a,  108, , ,a,  111, , ,a,  122, , ,a,  107, , ,a,   81, , ,a,   59, , ,a,   57, , ,a,   44, , ,a,   42, , ,a,   66, , ,a,   77, , ,a,   60, , ,a,   68, , ,a,-9999, , , 
18,40223,1949,06,PRCP,  277, , ,a,  239, , ,a,    0, , ,a,    0, , ,a,    0, , ,a,    0, , ,a,    0, , ,a,    0, , ,a,    3, , ,a,   53, , ,a,    0, , ,a,    0, , ,a,    0, , ,a,    8, , ,a,    3, , ,a,    3, , ,a,   71, , ,a,  107, , ,a,    0, , ,a,    0, , ,a,    0, , ,a,    0, , ,a,    0, , ,a,    0, , ,a,    0, , ,a,    0, , ,a,    0, , ,a,    0, , ,a,    0, , ,a,    0, , ,a,-9999, , , 
19,40223,1949,06,TAVG,  156,H, ,S,  174,H, ,S,  136,H, ,S,  129,H, ,S,  146,H, ,S,  153,H, ,S,  146,H, ,S,  165,H, ,S,  164,H, ,S,  154,H, ,S,  156,H, ,S,  126,H, ,S,  144,H, ,S,  153,H, ,S,  169,H, ,S,  167,H, ,S,  151,H, ,S,  154,H, ,S,  153,H, ,S,  150,H, ,S,  132,H, ,S,  102,H, ,S,  106,H, ,S,  109,H, ,S,  113,H, ,S,  121,H, ,S,  133,H, ,S,  132,H, ,S,  132,H, ,S,  129,H, ,S,-9999, , , 

Thanks in advance

Upvotes: 2

Views: 139

Answers (2)

norie
norie

Reputation: 9857

There's probably a more straightforward way to do what you want, but this code will work, and can be followed step by step.

Note, I saved the data in a file named weather_data.csv - using the clipboard wasn't quite working after a while.:)

import pandas as pd

# df = pd.read_clipboard(sep=',')
# df.to_csv('weather_data.csv')

df = pd.read_csv('weather_data.csv')

keep_columns = [col for col in df.columns[2:] if not 'flag' in col]
df = df[keep_columns]

df_melt = pd.melt(df, id_vars=['year', 'month', 'element'], value_vars=keep_columns[3:])

df_melt['day'] = df_melt['variable'].str[5:].astype(int)

df_melt = df_melt.query('value!=-9999')

df_melt['date'] = pd.to_datetime(df_melt[['year', 'month', 'day']])
                              
df_final = df_melt[['date', 'element', 'value']].pivot(index='date', columns='element')

print(df_final)

Sample output:

          value
element     PRCP   TAVG   TMAX  TMIN
date
1945-12-01   NaN  244.0    NaN   NaN
1945-12-02   NaN  228.0    NaN   NaN
1945-12-03   NaN  239.0    NaN   NaN
1945-12-04   NaN  231.0    NaN   NaN
1945-12-05   NaN  195.0    NaN   NaN
...          ...    ...    ...   ...
1949-06-26   0.0  121.0  197.0  42.0
1949-06-27   0.0  133.0  207.0  66.0
1949-06-28   0.0  132.0  216.0  77.0
1949-06-29   0.0  132.0  202.0  60.0
1949-06-30   0.0  129.0  204.0  68.0

Upvotes: 3

ThePyGuy
ThePyGuy

Reputation: 18416

Well, that's really a tedious task in terms writing the code to break down the structure you have.

Start by pivoting the dataframe you have passing ID, year, and month as index, element as column, and columns like value1, value2,... as values (you can use DataFrame.filter to get these values columns), then stack at level 0, take only PRCP and TMAX columns, then reset index at level -1.

Then extract the day values from level_3, (you can use regex value(\d+) to get day values, then reset the index at level 1 and 2.

Finally get the Date value out of year, month, and day columns.

out=df.pivot(['ID', 'year',  'month'],
             ['element'],
             df.filter(like='value').columns).stack(0)[['PRCP', 'TMAX']].reset_index(-1)
out['day']=out['level_3'].str.extract('value(\d+)')
out = out.reset_index([1,2])
out['Date'] = pd.to_datetime((out['year']
                             .astype(str)
                            +out['month']
                            .astype(str).str.zfill(2)
                            +out['day'].astype(str).str.zfill(2)),
              format='%Y%m%d', errors='coerce')

OUTPUT:

element  year  month  level_3  PRCP   TMAX day       Date
ID                                                       
40223    1945     12   value1   NaN    NaN   1 1945-12-01
40223    1945     12  value10   NaN    NaN  10 1945-12-10
40223    1945     12  value11   NaN    NaN  11 1945-12-11
40223    1945     12  value12   NaN    NaN  12 1945-12-12
40223    1945     12  value13   NaN    NaN  13 1945-12-13
       ...    ...      ...   ...    ...  ..        ...
40223    1949      6   value5   0.0  189.0   5 1949-06-05
40223    1949      6   value6   0.0  216.0   6 1949-06-06
40223    1949      6   value7   0.0  211.0   7 1949-06-07
40223    1949      6   value8   0.0  210.0   8 1949-06-08
40223    1949      6   value9   3.0  214.0   9 1949-06-09
[93 rows x 7 columns]

Upvotes: 2

Related Questions