Using random data may lead to unrepeatable results. You can mitigate this by logging (or otherwise recording) every random choice you make, and then playing those choices back. That could be as easy as recording the initial seed to your random number generator, assuming your data does not change over time.
It can be hard to write tests in a general enough way to deal with arbitrary, randomly selected data. This is the harder problem. Choosing a different random integer is one thing; choosing a different user (with properties that vary from one user to the next) is something else. Consider how a test works in the abstract: it chooses some inputs, applies them to a function, and then verifies that the result is correct. There are a few ways to approach that:
- The developer precalculates the expected result and hardcodes it into the test. That only works if the developer decides on the inputs ahead of time.
- The test uses the inputs to calculate the expected result, and then compares that to the actual result. If the test can use arbitrary (random inputs), this means the test needs to replicate a lot of the logic in the system under test. This is almost certainly not what you want, especially for a complicated system. You are just as likely as the developer to implement a complicated algorithm in a buggy way.
- The test uses some other means of verifying that the result matches the input. Sometimes there are shortcuts, or at least alternatives, you can take to verify a result. As a naive example, if a system sums a list of numbers, you might verify the result but subtracting the list from the sum, and then checking whether the new result is zero. Most systems cannot be tested this way.
A good place to use randomly selected data is in a comparator test, where you compare two versions of the same system using the same inputs. A comparator test will not tell you whether a system is correct, but it will help you find changes in behavior. That might be something for you to consider.