# Humino: Predicting when plants will need watering

In the last post I wrote a follow-along to building an Arduino-based soil monitoring system. In this post I want to write about how I used linear regression to predict how much time is left until the plants are dry again.

``````  Hibiskus        wet (56.40%)
Walnuss         wet (50.18%)
Dieffenb B      2 days (53.21%)
Dieffenb H      3 days (64.22%)
Yucca           7 days (45.06%)
``````

Edit: After running this for two weeks I found the predictions to be very unreliable and not very useful. Sometimes the sensor values fall continuously and then it works great, but often the influence of sun/warmth on the readings makes them unpredictable.

The measured data for each plant is a time series of humidity values. These values should fall continuously during the time in which the plant is slowly drying. Applying simple linear regression yields a linear function approximating this drying process. For this analysis we want to know how much time is left until this function reaches a certain point, say 50% humidity.

For this, only the last 24 hours of data are taken into account because if the plants were watered during measurement the linear model will predict rising humidity values. If the plants were watered in the past 24 hours they won’t need watering for some time anyway.

The regression can be computed using the `linear_model` module from sklearn.

``````from sklearn import linear_model
regr = linear_model.LinearRegression()
regr.fit(x, y)
print(regr.coef_)
> -0.05969473``````

This means that the plant loses about 0.06% humidity on every step of the data. Every step corresponds to 15 minutes, because this is the interval used for resampling, so we can compute the amount of time left as:

``15 * (last_observation - target) / regression_coefficient``

To do this computation with the data from Humino, the index has to be converted from timestamps to integers. It’s not possible to just assign a range of integers as the index because then we would lose information about gaps in the measurements.

So the complete code is now:

``````def predict_value(data, target):
# Offset to select 24h of data
offset = -24 * (60 / config.STEP)

# Data values as series from offset
y = data.values.reshape(-1, 1)[offset:]

# Set index to zero at offset by subtracting initial value
index_from_offset = data.index -  data.index[offset]

# Datetime index as series of integers
x = (index_from_offset / (config.STEP * 60 * 1000 * 1000 * 1000))[offset:] \
.values.reshape(-1, 1).astype('int')

regr = linear_model.LinearRegression()
regr.fit(x, y)

# Compute time remaining by solving for target value
rem = -1 * config.STEP * ((y[-1] - target) / regr.coef_)
return timedelta(minutes=rem)``````

The result can be written into the legend of the result plot or just embedded as part of the text output. When negative values come out this means that the plant has been watered in the past 24h, I just output `wet` in this case.

Results:

``````  Hibiskus        wet (56.40%)
Walnuss         wet (50.18%)
Dieffenb B      2 days (53.21%)
Dieffenb H      3 days (64.22%)
Yucca           7 days (45.06%)
``````

Please also see part 1 and part 3 of this series.