This article is the third in the series of posts from the Data Capacity Project collaboration with Keystone Foundation. It is also the first in the series that will be offering tips on working with data.
Almost every organisation working in the development sector has at some time or another, conducted a data collection exercise. It usually starts with developing 'formats' or surveys and then filling them up through interviews, discussions, and/or observations. Sometimes we also get secondary data, such as demographic data or rainfall data that we need to enter into a computer file for archiving, and/or printing out. All organizations have to go through this process and there are common practices that are prevalent but are not the best way to manage data. In this series of articles we examine the typical use cases and ways to avoid common errors and techniques to improve the quality of our work.
When one has to enter some data, say from a field survey format, one usually opens a spreadsheet (MS Excel, Libreoffice Calc etc.) and creates columns just as they are in the hard copy format. In some cases, people even use a word processor (MS Word, Libreoffice Writer etc.) and create tables for entering data. The common tendency is to create a file where the data is easily entered by referring to the hard copy formats. It is rarely seen as worthwhile to make sure you can access and use the data later, that bridge is often crossed when the time comes! There are a number of factors here that can cause trouble when trying to analyse data. Given below are some Dos and Don'ts with regard to working on a dataset in a spreadsheet.
Wrong method
Location and year of observation |
Bangalore, 2010 |
Chennai, 2009 |
|
Correct Method
Location of observation |
Year of observation |
Bangalore |
2010 |
Chennai |
2009 |
Option 1 – Simplest to enter, but difficult to group farmers by particular fertilisers used.
Farmer ID |
Fertilisers Used |
1 |
NPK, DAP, Urea |
2 |
NPK, DAP |
3 |
|
Option 2 – Works when the number of alternatives are few, E.g. only three fertilisers.
Farmer ID |
Fertiliser Used 1 |
Fertiliser Used 2 |
Fertiliser Used 3 |
1 |
NPK |
DAP |
Urea |
2 |
NPK |
DAP |
None |
3 |
None |
None |
None |
Option 3 - Works when the number of alternatives are few, E.g. only three fertilisers.
Farmer ID |
NPK Used |
DAP Used |
Urea Used |
1 |
Y |
Y |
Y |
2 |
Y |
Y |
N |
3 |
N |
N |
N |
Option 4 – Works better when a database rather than a spreadsheet is being used. Fertilisers used can then be a separate table linked by Farmer ID to other tables.
Farmer ID |
Fertiliser Used |
1 |
NPK |
1 |
DAP |
1 |
Urea |
2 |
NPK |
2 |
DAP |
3 |
None |
Wrong method
Area |
Ha |
Correct Method
Area_ha |
Farmer ID |
Landholding_bigha |
Landholding_acres |
1 |
5 |
1 |
2 |
10 |
2 |
Wrong method
Area |
Ha |
Correct Method
Area_ha |
Wrong method 1
Water level (ft) |
|
Pre-monsoon |
Post-monsoon |
Correct method 1
water_level_pre_monsoon_ft |
water_level_post_monsoon_ft |
|
|
Wrong method 2
Farmer ID |
Village_name |
1 |
A |
2 |
B |
3 |
C |
4 |
D |
5 |
|
6 |
Correct Method 2
Farmer ID |
Village_name |
1 |
A |
2 |
B |
3 |
C |
4 |
D |
5 |
D |
6 |
D |