Race (1, 2 and 4) and two of them have duplicates. We can seeįrom the list of the data set that there are three groups of observations of How many duplicate observations in variable race only. For example, in the followingĮxample, we add the variable race after dups. Groups formed: 1 containing 3 observations Returns information on the number of groups of observations that have duplicatesĪnd the number of duplicates in each group. One group of observations with duplicates consisting of observation number 1, 7 and 8. Subset of the larger dataset used above with added duplicates.įirst we enter the data: input id female race read Help? for more information about using search). How can I use the search command to search for programs and get additional Program in Stata, but can be installed over the internet using search dups (see In Stata, several programs are available to detect theĭuplicates and can also optionally drop the duplicates. It appears that we have gotten rid of the duplicate observations. We check that there are no other duplicate observations. Of each group with duplicate observations. The command drops all observations except the first occurrence Now, we can use the duplicates drop command to drop the duplicate We correct the scores, andĪfter the correction, the results from duplicates report and duplicates Were incorrect, and the real score was 44. From this, it would be advisable to check which Math scores over the duplicate observations. It is apparent that id = 1 has different values on New variable dup_id that assigns a 1 if the id isĭuplicated, and 0 if it appears once. Like that given from duplicates list, we use the duplicates tag command to create a The duplicates list and duplicates list id outputs. The other variables to see which variables are causing the difference between Now we see which five subjects are duplicated however, the duplicate list only lists the variable specified.
This duplicates list corresponds to listing those observations withĭuplicate rows however, as found with duplicates report, it does not | group: obs: id female ses read write math | Next we list duplicate observations with the duplicates listĬommand. (copies), which leads to a total of 10 questionable observations based on Unique id values, and five ids (surplus) that appear two times each Second command duplicates report id shows that we have 195 The same, and this command correctly did not pick them up. Note that in the duplicate whose value we changed (id=1), the two rows are not technically The duplicates report output shows the number of replicate rows over all variables. The duplicates examples command lists one example of each duplicatedĬlearly, the output from duplicates report and duplicates We could have used the duplicates examplesĬommand. Which gives the number of replicate rows by the variables specified This is followed by duplicate reports id, The duplicates report command to see the number of duplicate rows in For subject id =1, all of her values are duplicated exceptįor her math score one duplicate score is set to 84. This leads to 195 unique and 5 duplicated observations in theĭataset.
To add the duplicate observations, we sort the data by id, thenĭuplicate the first five observations (id = 1 to 5). Happen in practice we often search for "duplicate" cases that are not The rationale for changing a value is to mimic what may Also, to evaluate the sensitivity of theĭuplicate observations.
The data, and then use the duplicates command to detect which Therefore, we add five duplicate observations to This example uses the High School and Beyond dataset, which has noĭuplicate observations. This user-written command is niceīecause it creates a variable that captures all the information needed to The secondĮxample will use a user-written program. Theįirst example will use commands available in base Stata. There are two methods available for this task.
This Stata FAQ shows how to check if a dataset has duplicate